Normalized Cut with Adaptive Similarity and Spatial Regularization This work was supported by The National Key Research and Development Program of China (2017YFA0604903).

Normalized Cut with Adaptive Similarity and Spatial Regularization thanks: This work was supported by The National Key Research and Development Program of China (2017YFA0604903).

Faqiang Wang  Cuicui Zhao  Jun Liu  Haiyang Huang Laboratory of Mathematics and Complex Systems (Ministry of Education of China), School of Mathematical Sciences, Beijing Normal University, Beijing, 100875, People’s Republic of China. Jun Liu: jliu@bnu.edu.cn .
Abstract

In this paper, we propose a normalized cut segmentation algorithm with spatial regularization priority and adaptive similarity matrix. We integrate the well-known expectation-maximum(EM) method in statistics and the regularization technique in partial differential equation (PDE) method into normalized cut (Ncut). The introduced EM technique makes our method can adaptively update the similarity matrix, which can help us to get a better classification criterion than the classical Ncut method. While the regularization priority can guarantee the proposed algorithm has a robust performance under noise. To unify the three totally different methods including EM, spatial regularization, and spectral graph clustering, we built a variational framework to combine them and get a general normalized cut segmentation algorithm. The well-defined theory of the proposed model is also given in the paper. Compared with some existing spectral clustering methods such as the traditional Ncut algorithm and the variational based Chan-Vese model, the numerical experiments show that our methods can achieve promising segmentation performance.

Keywords: Normalized cut, Spectral Clustering, EM Algorithm, Operator Splitting, TV.

1 Introduction

Image segmentation is an important and attractive field in image processing, which plays an important role in many applications. The goal of image segmentation is to obtain a meaningful partition for an image to do some further tasks, including feature extraction, image classification and object recognition. In general, the segmentation result of an image is some partitions which cover the whole image, or some contours that separate the image into different regions, such that the pixels in the same region share some similarities such as intensity, texture and color.

A great deal of the segmentation models have been proposed in the literature. In which the edge based segmentation methods such as snakes and active contour models [21, 6] are popular research branch in image segmentation. Another segmentation method is the Markov Random Field [30] and expectation maximum (EM) [4] based statistical methods [22]. In such a method, pixels are often regarded as some samples taken from a random variable whose distribution is a parametrical mixture model. Besides, clustering based methods treat pixels as data points and convert image segmentation to a clustering process, such as K-means based methods [20, 7] and spectral clustering based methods [31, 37]. In addition, learning-based segmentation methods draw much attention recently, especially for the deep learning neural network based methods. Usually, the learning based methods need a large amount of labeled data to train some desirable networks [11, 19, 42, 13]. To be different from learning based method, variational approaches can obtain a segmentation from a single image by minimizing a cost functional which consists of the data and regularization terms. Here the data term is a clustering criterion, and the regularization term usually contains some prior information of segmentations. For examples, one can use the well-know total variation (TV) to punish the length of region contour [7, 25], and to get some smooth segmentation boundaries. In order to get some special geometrical shape, star shape prior regularization [35] and convexity constraint regularization term [17, 24] can be adopted. Meanwhile, the regularization technique can improve the precision of segmentation and enhance the robustness of the result under noise [2]. The variational methods and learning based methods both have promising performance. The main differences between these two methods are the supervised samples. The variational methods just need a single image without labels while learning based methods often need massive images with labels. In this paper, we pay more attention on the variational methods, which are flexible to combine many methods together.

Here we give some introductions for clustering based methods. A popular K-means based segmentation model is Chan-Vese (CV) model [7], which consists of K-means clustering data term and TV regularization. CV has a good segmentation performance for some linear separated data. However, the CV model partly depends on the initial values and tends to obtain a local minimum. Besides, CV model can not address the clustering problem of nonlinear separable data set, such as the nested double moons data set since the K-means is a linear boundary based clustering algorithm. Spectral clustering [36] is also a widely used clustering algorithm, the key idea of spectral clustering is to transfer the data points into an nonlinearly feature space, in which the data can be easily segmented by some simple linear clustering algorithms. In spectral graph theory [9], spectral clustering is equal to a min-cut problem, which can be solved by some global minimization algorithms, and they are independent of initial values.

Due to spectral clustering has much stronger nonlinear separable ability, numerous segmentation methods based on spectral clustering have been proposed. In these methods, an image is represented as an undirected weighted graph, in which the nodes correspond to the image pixels and the edges connect pairs of nodes equipped with weights measuring the similarity between nodes. Then the image segmentation is equivalent to finding a min-cut for such a graph. The work of Ulrike Von Luxburg [36] provides a comprehensive review of spectral clustering. Hagen and Kahng present a cheeger cut criterion based clustering method [18], which shows better performance. Shi and Malik establish a new balanced graph cut criteria: normalized cut (Ncut) [31], aiming at overcoming the drawbacks of traditional cut criteria based model [37]. Szlam and Bresson [32], Buhler and Hein [5] found the relationship between spectral clustering and nonlocal total variation [15], respectively, which provide spectral clustering theoretical supports. Since the normalized cut problem can be relaxed to an eigenvalue system, many normalized cut based models have been proposed. Yu and Shi [40] establish the Ncut model with linear homogeneous equality constraints. It is extended to the situation of non-homogenous equalities by Eriksson et al. [12]. Bernard et al. [14] proposed a new framework to solve the Ncut problem with priors and convex constraint by Dinkelbach method [29]. All the related works reveal that the segmentation method based on spectral clustering is well-behaved and has tremendous potential.

Though these mentioned spectral clustering based methods are proved having good performance, some flaws still exist. For instance, these methods are sensitive to noise which makes the methods unstable since they lack of spatial prior information. Moreover, one should give the similarity matrix in advance during the spectral clustering procedure, and the choice of parameters in the similarity matrix is intractable. Tang et al. [33] combines MRF regularization with normalized cut to show better robustness. However, the model adopts KNN affinity (similarity) construction in normalized cut procedure which is a bit crude. Yu et al. [41] equip an -regularized energy term in cut based formulation to promote sparse solutions, and the affinities similarity of [31, 1] were adopted in a piecewise flat embedding model. However, all these models have less consideration of the similarity design.

To solve the problems existing in segmentation model based on spectral clustering, in this work, we integrate the well-known expectation-maximum (EM) method [4] in statistics and the regularization technique in PDE into the normalized cut, while the introduction of EM technique which is based on histogram modeling endows normalized cut an adaptive similarity matrix, and the regularization will enhance the robustness of the result under noise. To obtain an adaptive similarity, inspired by Gaussian Mixture Model and EM method, we deduce a model based similarity which can be updated iteratively to help us get a better classification criterion. In fact, normalized cut is a linear method, which equals to solve an eigenvalue system after relaxation, and the regularization will enhance spatial smoothness of the spectrum vector in fact. To unify the three totally different methods, we built a variational framework to naturally combine them and get a general normalized cut segmentation algorithm.

The main contributions of this paper include: firstly, we built a new variational framework of the normalized cut which unifies three totally different methods: EM, regularization, and spectral graph clustering, which makes the results of normalized cut image segmentation method numerical stability and better performance. To the best of our knowledge, this is the first research to combine these three methods together; Secondly, we establish the well-defined theory of the existence of the minimizer for the proposed model. Thirdly, compared with some existing spectral clustering based methods and classical Chan-Vese model, the numerical experiments show that our algorithm can achieve desirable segmentation performance.

The rest of this paper is organized as follows: we review the related works in section 2, containing Chan-Vese model, normalized cut, Gaussian Mixture Model and EM algorithm as well. Two variational based normalized cut models with adaptive similarity and regularization: Normalized Cut with Adaptive Similarity and H regularization (NCASH) model and Normalized Cut with Adaptive Similarity and TV regularization (NCASTV) model, are proposed in section 3. Meanwhile, we prove the existence of the minimizer of NCASTV model in this part. Section 4 gives the algorithms and the details in implementation. In section 5, we show the experimental results. We summarize our methods and make some conclusions in section 6.

2 Related Works

In this section, we introduce some related works. First of all, we review the Chan-Vese model [7] briefly, and then we introduce classical Ncut model [31]. Meanwhile, the similarity matrix in traditional Ncut is parametric, which affects the segmentation results greatly, and the choice of parameters is troublesome. To solve this problem, inspired by the Gaussian Mixture Model and EM algorithm, we construct a variational framework of which optimal solution can measure the similarity of intensity between image points adaptively. For these reasons, we review Gaussian Mixture Model and EM algorithm in the last part of this section.

2.1 Chan-Vese Model [7]

Let be the image defined in an open bounded set , Chan-Vese model [7] considers a piecewise constant approximation model used for two phase segmentation

where is the signed distance function and is the Heaviside function, are two unknown constants, are fixed weights parameters.

In fact, Chan-Vese model consists of K-means based data term and TV regularization. The traditional method to solve Chan-Vese model is evolving the level-set function by gradient flow to obtain the segmentation results. Recently, there are many fast operator splitting algorithms [10, 8, 16, 38] to solve it.

2.2 Normalized Cut [31]

Normalized cut method represents an image as an undirected weighted similarity graph , and then image segmentation is equivalent to finding a min-cut in a graph. Here is the set of pixels, is the edges between pixels, and is the weight matrix which measures the similarity between pixels (similarity matrix). Here the graph is undirected, that is, the similarity weight matrix is symmetric. More details can be found in [36]. In fact, the similarity matrix is parametric and the parameters will affect the results of the data clustering or segmentation process.

Graph-Cut: Graph has a partition , such that . Then the cut value between and is defined as

where is the weight between point and .

The segmentation models based on min-cut criterion [37] have been demonstrated good performance on some natural images. However, as the authors noticed in their work, the above cut criteria favors cutting small sets of isolated nodes in the graph, which is not desirable in real applications. To overcome the bias existing in cut based model, Shi and Malik [31] present the well-known normalized cut

where

The introduction of the measure of the “size” of each partition improves the performance of image segmentation, while avoiding the segmentation bias. However, the minimization of the normalized cut is NP-Hard [31]. Fortunately, the problem can be relaxed to be a generalized-eigenvalue problem below, which can be solvable by

Here the relaxation function f of a binary variable can be used to label the segmentation, and is the degree matrix [36] which is a diagonal matrix with .

The minimization of the relaxation of normalized cut [3] can be expressed as

(1)

Then, it is easy to get the pixel-by-pixel version of (1)

where

It is easy to see that the min-Ncut problem can be transformed to an optimization problem, and the similarity matrix appeared in Ncut method is empirically given in advance, which is parametric. In the next part, we will review the ideas of Gaussian mixture model and EM algorithm, which is helpful to establish an adaptive similarity.

2.3 Gaussian Mixture Model and its Parameters Estimation

A Gaussian mixture model (GMM) can be expressed as

where is continuous-valued data vector, is a set of parameters. And are the mixture weights, which satisfy the constraint that, are the -th component Gaussian density functions, here .

Generally, the parameters of GMM are estimated by the iterative EM algorithm or maximum a posteriori (MAP) estimation from a well-trained prior model [28].

2.4 Expectation Maximum Algorithm

Expectation maximum (EM) algorithm is a classical method for parameters estimation of mixture models. It is used when the data has some missing values, or when optimizing the likelihood function is analytically intractable, but the likelihood function can be simplified by assuming the existence of some values [4]. EM algorithm has great applicability because of its stable convergence and convenient operation.

Assume dimension random variable , and is the density function of a gaussian mixture distribution governed by a set of parameters . Given data set , by independent distribution assumption, one can get the likelihood function of the parameters is

which is difficult to optimize because it contains the log of the sum. However, if we consider is incomplete, and assume that there is an unobserved data items , where whose values inform us the pixel comes from which gaussian distribution, then the likelihood function can be simplified significantly[4].

Assume the existence of joint density and the data is independent identically distributed. Given , then can be transformed as

(2)

where

The derivation of (2) can be found in [22].

With the help of EM algorithm, the function can be optimized directly. Notice that is the probability to measure how likely belongs to cluster , and this is what we want in a segmentation model. Furthermore, considering that if the number of gaussian equals to the amount of points, then this probability can be used to measure the similarity between the corresponding two points. In our model, we will use this key point and construct a similarity metric by functional itself.

3 The Proposed Method: Normalized Cut with Adaptive Similarity and Spatial Regularization

3.1 Statistical Methods

In this section, we shall propose normalized cut segmentation models with adaptive similarity and spatial regularization under variational framework. To obtain a similarity concerning energy, inspired by parameter estimation of GMM and the EM algorithm, we model the image by empirical distribution, which has the form of GMM and introduce a auxiliary variable which can measure the similarity between points, then formulate the process of parameters estimation as an alternating optimization. In our model, the similarity function can be determined by the cost functional itself, which can be updated with the iteration goes on. To sum up, combining the normalized cut and common spatial regularization, we establish the model, which integrates the advantages of variational regularization and spectral clustering.

3.1.1 Empirical distribution of Image.

Let be a discrete set in , stands for the image. Assume that each of the intensity value be a realization of the random variable , then are samples of , and the normalized histogram of can be expressed as

where

Notice is not smooth, we replace it with a gaussian function , since the limitation of is the Delta function in the sense of distribution, where

When is small, then it can approximate well.

The smooth approximation of frequency histogram would be

(3)

Obviously, is a gaussian mixture distribution parameterized by , so we deal with it by EM process as mentioned earlier. Here we indeed take full use of the property of term mentioned previously so as to obtain an adaptive similarity.

3.1.2 Adaptive Similarity Functional

Assume that are samples of with the density given in (3), then the log-likelihood function

Let us introduce a hidden random variable , whose value indicates the sample comes from the -th component of the gaussian mixtures. Then we have the complete data as , in which means the sample is produced by the -th gaussian distribution. Here the number of gaussian component is , i.e. , then we have

(4)

where

and

For the completeness of the paper, we list the details of derivation of (4) in Appendix A.

Using the fact

we plug it into , and then

where .

Here can be used to measure the similarity between and . For these reason, we introduce an auxiliary variable , then the parameters estimation (4) can be converted to a minimization process

(5)

where

The minimization problem (5) can be efficiently solved by the alternating scheme

(6)

As for this iteration (6), our previous work [22] has proven the energy corresponding to this problem is decreasing with respect to the parameter and it has

Lemma 1 (Likelihood Function Ascent [22])

The sequence produced by iteration (6) satisfies

This lemma allows us to employ to get a maximum likelihood estimator in which is easier to be optimized than .

Using Lagrangian multiplier method, we can easily get the closed-form solution of

(7)

where serves as a normalization factor. Obviously, has the similar form of gaussian similarity function used in the full-connected graph [36], which can be used as a measure of similarity between image points. Since here the similarity is model-based, and it can be updated adaptively, which solve the problem of choosing parameters in similarity.

Since the existence of normalization factor in the similarity function obtained by (7), the similarity is asymmetric. In our model, we equip some symmetrization process to such a similarity. Here, we project the asymmetric similarity matrix to the convex set which consists of symmetric matrix. For this projection, it is not difficult to get

Proposition 1

Given set then is convex, and the projection of a matrix onto is .

To unify the normalized cut and the functional deduced by EM process, we add a projection process to our model, which serve as a symmetry matrix set constrain in the optimization process.

From EM process and the analysis of normalized cut, normalized cut functional and similarity functional are established, naturally, we propose the adaptive similarity normalized cut model with some spatial regularizers under variational framework.

3.2 Variational Framework

3.2.1 Some Definitions and Notations

Let be an open bounded set, and be the image function. Meanwhile, is a nonnegative smooth similarity function, is bounded almost everywhere. Given is a smooth weighting function with , where . Besides, let us denote the symbol “*” stand for a convolution operator, i.e. .

3.2.2 Normalized Cut with Adaptive Similarity and Spatial Regularization

Based on the analysis in the previous sections, we propose the variational normalized cut model with adaptive similarity and spatial regularization as

(8)

where , and , , and are two positive parameters which can control the balance of each term in the cost functional.

Here we choose two popular regularizers in the field of computer vision: regularizer and TV regularizer . The normalized cut with adaptive similarity and regularization model, which is called NCAS for short, can be optimized efficiently which is essentially a linear system. TV regularization model which is denoted by NCASTV for short, has superiority in segmentation which is verified in many models [7, 8, 32, 15], and can be solved by the operator splitting methods.

In fact, the first two terms serve as EM process, which endow the model adaptive similarity. More specifically, the first term can be seen as smoothness clustering criterion which is slight different the K-means’ piecewise constants version. Besides, there is a variance to better discriminate pixels. The second term is a negative entropy regularizer, which force the similarity metric to be smooth. The third term is the normalized cut energy, which serve as clustering and the last term formulates spatial regularization to make our segmentation results be smooth and robust to noise.

In the proposed methods, there are some convolution terms, which is beneficial to prove the existence of minimizers theoretically in a proper functional space [23], the convolution of a smooth kernel and can be seen as a spatial regularization which will enhance the segmentation performance as well. Meanwhile, if we set small enough, then would be a delta function and the convolution operators would be disappeared.

3.2.3 Existence of Minimizer for the Proposed Models

In this section, we will theoretically prove the existence of minimizers for the proposed models. We will just give the result for NCASTV since both of NCASTV and NCAS have minimizers by choosing proper function spaces (BV and ) and similar analysis method.

Let us consider the following energy functional for NCASTV model

where is a polish function satisfying . Without loss of generality, we assume and .

Here we will show the existence of one minimizer for NCASTV model in the following space

Theorem 1

There exists at least one solution for NCASTV model, i.e.

(9)

Proof: Details of the proof can be found in Appendix B.

4 Algorithms

4.1 Algorithm for NCAS Model

The discrete algorithm of NCAS model (8) can be directly minimized by alternating minimization algorithm, one may have the subproblems below

where is the lagrange multiplier, and are the parameters.

As for the subproblem, we solve the corresponding optimization problem and then calculate the projection of in

According to optimal condition, we have

Then

(10)

where

Since , we have

(11)

plug (11) into (10), then

Since , according to Proposition 1, the projection of similarity matrix which consists of is (the projection is also denoted as )

According to the optimal condition of subproblem, then can be optimized by

Since

where is the discretion of convolution . Then subproblem equals to a constrained linear problem

(12)

where is the similarity matrix, is the degree matrix, and f is the discretion of .

Denote , the subproblem of f can be converted to an eigenvalue type problem with respect to z:

This is a eigenvalue problem without the constraint . We can use Lagrangian method to address this constraint.

Define

Then by Lagrangian multiplier method, we have

Please note here is the standard Lagrangian multiplier and thus the above problem is a minimization problem with respect to . Otherwise, it should be a saddle problem with respect to and .

According to the first order optimal condition, we have

By projection gradient method, and inspired by Dinkelbach method [29] [14], we give the following iteration scheme

(13)

In terms of the setting of , here we give a theorem to show that is decreasing as the iterations go on.

Theorem 2

Consider such an optimization problem

where and . If solves (P1), then .

Proof: Since solves (P1), then

(14)

For , (14) means , that is , simple transformation, then .

If we set , then we have in (13). Furthermore, since is semi-positive, so is lower-bounded, which simply describes the convergence of . Numerical experiments shown in Figure 4 demonstrate the convergence of .

Here, we pay more attention on the projection process

which equals to the optimization of the lagrange function below

where is the lagrange multiplier.

Then according to the optimal condition, we have

(15)

transposition of (15) and multiplied by , then

(16)

plug (16) into (15), then

To obtain the solution of (12), one can easily get

since is diagonal and invertible.

To sum up, we give Algorithm 1 for NCAS Model.

1.Given , , tolerant error = ; Set =2 ,=50, let .

2.Update similarity matrix

3.Calculate the projection of in

4. Update


5.Calculate z

6.Reconstruct segmentation vector f

7.If , stop; Else, set , go to step 2.

Algorithm 1 NCAS Model

4.2 Algorithm for NCASTV Model

The discrete algorithm of NCASTV model (8) also can be directly minimized by alternating minimization algorithm, one may have these subproblems

where is the lagrange multiplier, and are the parameters.

As for subproblem of and , we adopt the same process as NCAS model. Here we mainly emphasis on subproblem of f

where is the similarity matrix, is the degree matrix, and f is the discretization of .

To solve f subproblem, we adopt a splitting method, here we give the iteration scheme by penalty method. We introduce an auxiliary variable g, which satisfies , then we have the following problem

(17)

where is a penalty parameter.

Furthermore, we have two subproblems of problem (17)

(18a)
(18b)

As we can see, the (18b) problem is the ROF model for denoising, and (18a) problem is a linear problem which is similar to problem (12), so our model can be regarded as an alternating process of normalized cut and the denoising of the results of clustering. It is reasonable that our model will have a better performance. Since the ROF model can be efficiently solved [10, 8, 38], here we pay more attention on the problem (18a).

Here we make a transformation of f, which is . Therefore, the subproblem of f can be converted to the problem concerning about z