Robust and Efficient Fuzzy C-Means Clustering Constrained on Flexible Sparsity

Robust and Efficient Fuzzy C-Means Clustering Constrained on Flexible Sparsity

Jinglin Xu, Junwei Han  Feiping Nie and Xuelong Li  Jinglin Xu and Junwei Han are with the School of Automation, Northwestern Polytechnical University, Xi’an 710072, Shaanxi, China.
E-mail: {xujinglinlove, junweihan2010}@gmail.com Feiping Nie and Xuelong Li are with School of Computer Science and Center for OPTical IMagery Analysis and Learning (OPTIMAL), Northwestern Polytechnical University, Xi’an 710072, Shaanxi, China.
E-mail: feipingnie@gmail.com, xuelongli@nwpu.edu.cn Corresponding author
Abstract

Clustering is an effective technique in data mining to group a set of objects in terms of some attributes. Among various clustering approaches, the family of K-Means algorithms gains popularity due to simplicity and efficiency. However, most of existing K-Means based clustering algorithms cannot deal with outliers well and are difficult to efficiently solve the problem embedded the -norm constraint. To address the above issues and improve the performance of clustering significantly, we propose a novel clustering algorithm, named REFCMFS, which develops a -norm robust loss as the data-driven item and imposes a -norm constraint on the membership matrix to make the model more robust and sparse flexibly. In particular, REFCMFS designs a new way to simplify and solve the -norm constraint without any approximate transformation by absorbing into the objective function through a ranking function. These improvements not only make REFCMFS efficiently obtain more promising performance but also provide a new tractable and skillful optimization method to solve the problem embedded the -norm constraint. Theoretical analyses and extensive experiments on several public datasets demonstrate the effectiveness and rationality of our proposed REFCMFS method.

K-Means Clustering, Fuzzy C-Means Clustering, -norm Loss, -norm Constraint, Flexible Sparsity.

I Introduction

As a fundamental problem in machine learning, clustering is widely used for many fields, such as the network data (including Protein-Protein Interaction Networks [9], Road Networks [21], Geo-Social Network [53]), medical diagnosis [37], biological data analysis [52], environmental chemistry [14] and so on. K-Means clustering is one of the most popular techniques because of its simplicity and effectiveness, which randomly initializes the cluster centroids, assigns each sample to its nearest cluster and then updates cluster centroid iteratively to cluster a dataset into some subsets.

Over the past years, many modified versions of K-Means algorithms have been proposed, such as Global K-Means [46] and its variants [33, 16, 17], MinMax K-Means clustering [47], K-Means based Consensus clustering [54], Optimized Cartesian K-Means [49], Group K-Means [50], Robust K-Means [20], I-K-Means-+ [25] and so on. Most importantly, researchers have pointed out that the objective function of K-Means clustering can be expressed as the Frobenius norm of the difference between the data matrix and the low rank approximation of that data matrix [3, 6]. Specifically, the problem of hard K-Means clustering is as follows:

(1)

where is a matrix of data vectors ; is a matrix of cluster centroids ; is a cluster indicator matrix of binary variables such that if where denotes the -th cluster and otherwise .

Although the K-Means clustering algorithm has been used widely, it is sensitive to the outliers, which easily deteriorate the clustering performance. Therefore, two main approaches are proposed to deal with the outliers in K-Means clustering: one based on outlier analysis (outlier detection or removal), and the other one based on outlier suppression (robust model).

For the first one, much work has been done on outlier analysis. Several algorithms[28, 23, 22, 29, 58, 39, 27] perform clustering and outlier detection in stages, where a dataset is divided into different clusters and some measures are calculated for the data points based on the clusters to identify outliers. Besides, [43] defines outliers in terms of noise distance, where the data points that are about the noise distance or farther away from any other cluster centers get high membership degrees to the outlier cluster. [57] introduces a local distance-based outlier factor to measure the outlierness of objects in scattered datasets. [38] extends the facility location formulation to model the joint clustering and outlier detection problem and proposes a sub-gradient-based algorithm to solve the resulting optimization problem. [51] proposes a non-exhaustive overlapping K-Means algorithm to identify outliers during the clustering process. [18] provides the data clustering and outlier detection simultaneously by introducing an additional “cluster” into the K-Means algorithm to hold all outliers. For the second one, the main strategy of outlier suppression is to modify the objective functions as a robust model, such as [31].

Fuzzy C-Means (FCM) is a simple but powerful clustering method by introducing the concept of fuzzy sets that has been successfully used in many areas. There are, however, several well-known problems with FCM, such as sensitivity to initialization, sensitivity to outliers, and limitation to convex clusters. Therefore, many extensions and variants of FCM clustering are advanced in recent years. Augmented FCM [26] revisits and augments the algorithm to make it applicable to spatiotemporal data. Suppressed FCM [45] is proposed to increase the difference between high and low membership grades, which gives more accurate partitions of the data with less iterations compared to the FCM. Sparse FCM [42] reforms traditional FCM to deal with high dimensional data clustering, based on Witten’s sparse clustering framework. Kernel based FCM [15] optimizes FCM, based on the genetic algorithm optimization which is combined of the improved genetic algorithm and the kernel technique to optimize the initial clustering center firstly and to guide the categorization. Multivariate FCM [41] proposes two multivariate FCM algorithms with different weights aiming to represent how important each different variable is for each cluster and to improve the clustering quality. Robust-Learning FCM [56] is free of the fuzziness index and initializations without parameter selection, and can also automatically find the best number of clusters.

Since the above extensions still has weak performance when dealing with outliers, several robust FCM algorithms come out. Specifically, conditional spatial FCM [1] improves the robustness of FCM through the incorporation of conditioning effects imposed by an auxiliary variable corresponding to each pixel. Modified possibilistic FCM [2] jointly considers the typicality as well as the fuzzy membership measures to model the bias field and noise. Generalized entropy-based possibilistic FCM [5] utilizes functions of distance instead of the distance itself in the fuzzy, possibilistic, and entropy terms of the clustering objective function to decrease noise contributions on the cluster centers. Fast and robust FCM[32] proposes a significantly faster and more robust based on the morphological reconstruction and membership filtering.

Inspired by above analysis, we develop a novel clustering method in this work, named as Robust and Efficient Fuzzy C-Means Clustering constrained on Flexible Sparsity (REFCMFS), by introducing a flexible sparse constraint imposed on the membership matrix to improve the robustness of the proposed method and to provide a new idea to simplify solving the problem of sparse constraint on -norm. The proposed method REFCMFS not only improves the robustness from the -norm data-driven term but also obtains the solution with proper sparsity and greatly reduces the computational complexity.

Note that we have proposed a Robust and Sparse Fuzzy K-Means (RSFKM) clustering method [55] recently. However, our proposed REFCMFS method in this paper is quite different from RSFCM. Concretely, RSFCM takes into account the robustness of the data-driven term by utilizing -norm and capped -norm, and utilizes the Lagrangian Multiplier method and Newton method to solve the membership matrix whose sparseness is adjusted by the regularized parameter. In contrast, our proposed REFCMFS method maintains the robustness of the clustering model by using the -norm loss and introduces the sparse constraint -norm imposed on the membership matrix with a flexible sparsity, i.e., where denotes the number of nonzero elements in . It is well known that solving the problem with -norm constraint is difficult. The proposed REFCMFS method absorbs this constraint into the objective function by designing a novel ranking function , which is an efficient way to calculate the optimal membership matrix and greatly reduces the computational complexity, especially for a large dataset. The related theoretical analyses and comparison experiments can be demonstrated in Sections IV and V.

The contributions of our proposed REFCMFS method in this paper can be summarized as follows:

  1. REFCMFS develops the -norm loss for the data-driven item and introduces the -norm constraint on the membership matrix, which makes the model have the abilities of robustness, proper sparseness, and better interpretability. This not only avoids the incorrect or invalid clustering partitions from outliers but also greatly reduces the computational complexity.

  2. REFCMFS designs a new way to simplify and solve the -norm constraint directly without any approximation. For each instance, we absorb into the objective function through a ranking function which sorts elements in ascending order and selects first smallest elements as well as corresponding membership values and sets the rest of membership values as zeros. This makes REFCMFS can be solved by a tractable and skillful optimization method and guarantees the optimality and convergence.

  3. Theoretical analyses, including the complexity analysis and convergence analysis, are presented briefly, and extensive experiments on several public datasets demonstrate the effectiveness and rationality of the proposed REFCMFS method.

The rest of this paper is organized as follows. The related works are introduced in Section 2. In Section 3, we develop a novel REFCMFS method and provide a new idea to solve it masterly. Some theoretical analyses of REFCMFS, i.e., complexity analysis and convergence analysis, are shown in Section 4. Section 5 provides the experimental results on several public datasets, followed by convergence curves and parameter sensitivity analyses of experiments. The conclusion is shown in Section 6.

Ii Preliminary Knowledge

In this section, we briefly review some typical literature on K-Means Clustering, Gaussian Mixed Model, Spectral Clustering, Fuzzy C-Means Clustering, and Robust and Sparse Fuzzy K-Means Clustering related to the proposed methods.

Ii-a K-Means Clustering

The K-Means clustering has been shown in problem (1), where is the membership matrix and each row of satisfies the -of- coding scheme (if a data point is assigned to the -th cluster then and otherwise). Although K-Means Clustering is simple and can be solved efficiently, it is very sensitive to outliers.

Ii-B Fuzzy C-Means Clustering

As one of the most popular fuzzy clustering techniques, Fuzzy C-Means Clustering [8] is to minimize the following objective function:

(2)

where is the membership matrix and whose elements are nonnegative and their sum equals to one on each row. The parameter is a weighting exponent on each fuzzy membership and determines the amount of fuzziness of the resulting clustering.

The objective functions of K-Means and FCM are virtually identical, and the only difference is to introduce a vector (i.e., each row of ) which expresses the percentage of belonging of a given point to each of the clusters. This vector is submitted to a ’stiffness’ exponent (i.e., ) aimed at providing more importance to the stronger connections (and conversely at minimizing the weight of weaker ones). When tends towards infinity, the resulting vector becomes a binary vector, hence making the objective function of FCM identical to that of K-Means. Besides, FCM tends to run slower than K-Means since each point is evaluated with each cluster, and more operations are involved in each evaluation. K-Means only needs to do a distance calculation, whereas FCM needs to do a full inverse-distance weighting.

Ii-C Gaussian Mixed Model

Unlike K-Means clustering which generates hard partitions of data, Gaussian Mixed Model (GMM) [10] as one of the most widely used mixture models for clustering can generate soft partition and is more flexible. Considering that each cluster can be mathematically represented by a parametric distribution, the entire dataset is modeled by a mixture of these distributions. In GMM, the mixture of Gaussians has component densities mixed together with mixing coefficients :

(3)

where are parameters such that and each is a Gaussian density function parameterized by . GMM use mixture distributions to fit the data and the conditional probabilities of data points, , are used to assign probabilistic labels. Although the Expectation-Maximization (EM) algorithm for GMM can achieve the promising results, it has a high computational complexity.

Ii-D Spectral Clustering

The general Spectral Clustering (SC) method [13] needs to construct an adjacency matrix and calculate the eigen-decomposition of the corresponding Laplacian matrix. However, both of these two steps are computational expensive. Given a data matrix , spectral clustering first constructs an undirected graph by its adjacency matrix , each element of which denotes the similarity between and . The graph Laplacian is calculated where denotes the degree matrix which is a diagonal matrix whose entries are row sums of , i.e., . Then spectral clustering use the top eigenvectors of corresponding to the smallest eigenvalues as the low dimensional representations of the original data. Finally, the traditional K-Means clustering is applied to obtained the clusters. Due to the high complexity of the graph construction and the eigen-decomposition, spectral clustering is not suitable to deal with the large-scale applications.

Ii-E Robust and Sparse Fuzzy K-Means Clustering

Considering that the -norm loss imposed in problems (1) and (2) lacks of robustness, with the development of -norm [36, 30] technologies, amount of robust loss functions are designed and shown their empirical successes in various applications. For example, the recent work [55] provided a robust and sparse Fuzzy K-Means clustering by introducing two robust loss functions (i.e., -norm and capped -norm) and a penalized regularization on membership matrix. Its objective functions can be written as:

(4)

where

(5)

where is the membership matrix and is the regularization parameter, and is a threshold. When is zero, the membership vector of each sample becomes extremely sparse (only one element is nonzero and others are zero). The membership matrix equals to the binary clustering indicator matrix, which is hard K-Means clustering. With the gradual increase of , membership vector contains a growing number of nonzero elements. When becomes a large value, all elements in membership vectors are all nonzero, which is equivalent to FCM clustering.

Iii Robust and Efficient FCM Clustering Constrained on Flexible Sparsity

In this section, we introduce our proposed REFCMFS method, which develops the -norm loss for the data-driven item and imposes the -norm constraint on the membership matrix to make the model more robust and sparse flexibly. We also design a new way to simplify and solve the -norm loss with the -norm constraint efficiently without any approximation.

Iii-a Formulation

Based on the Fuzzy C-Means Clustering algorithm, in order to make the model more robust, proper sparse, and efficient during the clustering, we propose the following objective function:

(6)

where is the membership matrix constrained by the -norm and is the hyper-parameter that controls how fuzzy the cluster will be (the higher, the fuzzier) and denotes the number of nonzero elements in , which constrains the sparseness of membership matrix to be .

We find that constrains the number of all the elements of not the number of elements of each , where is the -th row of membership matrix and responds to the membership vector of the -th sample. This easily leads to two extreme cases for , i.e., and , where makes the soft partition degrade into the hard partition and results in an invalid partition for the -th sample because all the membership values are equal. Therefore, we further divide the problem (6) into subproblems and impose the -norm constraint on the membership vector for each sample. Therefore, REFCMFS can be presented as follows:

(7)

where denotes the number of nonzero elements in , , and .

It is obvious that achieves the robustness by using the -norm on the similarity between and , and makes the membership vector with the sparsity , which not only avoids the incorrect or invalid clustering partitions from outliers but also greatly reduces the computational complexity.

Iii-B Optimization

In this subsection, we provide an efficient iterative method to solve problem (7). More specifically, we alternatively update one optimization variable while keeping other optimization variables fixed. It is represented as follows.

Step 1: Solving while fixing

With the centroid matrix fixed, the problem (7) becomes:

(8)

Due to directly solve the problem (8) difficultly, we need to do some transformations as follows:

(9)

where and is a row-vector contained different . To efficiently minimize the problem (9), we define a ranking function and perform it on , and then obtain:

(10)

where sorts elements of in ascending order and is the corresponding permutation matrix which results in permuting columns of along the order . Based on equation (10), we select first smallest elements as well as their corresponding membership values in , meanwhile, setting the membership values of the rest elements as zeros, i.e., . Intuitively, we present the above operations in Figure 1.

Fig. 1: Explaination of performing on and . For instance, supposed that , , and , then . According to the first elements of , i.e., [0.6, 1.9, 2.4], selecting their corresponding membership values and setting the rest membership values as zeros, there is .

Therefore, the problem (9) is equivalent to the following problem by absorbing the -norm constraint into the objective function:

(11)

By using the Lagrangian Multiplier method, the Lagrangian function of problem (11) is:

(12)

where is the Lagrangian multiplier. To solve the minimum of problem (12), we take the derivatives of with respect to and , respectively, and set them to zeros. We obtain the optimal solution of problem (11):

(13)

where .

Substituting the equation (13) into problem (11), its optimal value arrives at:

(14)

It is obvious that the minimum depends on , the smaller the better.

Therefore, the optimal solution of problem (9) is:

(15)
Step 2: Solving while fixing

With the membership matrix fixed, the problem (7) becomes:

(16)

which can be solved by introducing a nonnegative auxiliary variable and using the iterative re-weighted method. Thus, we rewrite the problem (16) as:

(17)

where

(18)

The optimal solution of problem (17) can be reached by taking derivative and setting it to zero. That is:

(19)

Assuming that and are computed at the -th iteration, we can update the nonnegative auxiliary variable according to equation (18) by current . Intuitively, the above optimization is summarized in Algorithm 1.

Input: Data matrix , the number of clusters , parameters and
Output: Membership matrix , centroid matrix
Initialize centroid matrix ;
while  do
       for each sample ,  do
            Obtaining the membership values in the problem (8) by using the equation (15)
       end for
      for each cluster ,  do
            Calculating the centroid vector and updating the auxiliary variable via (19) and (18).
       end for
      
end while
Algorithm 1 Solving the problem (7)

Iv Theoretical Analysis

In this section, we provide computational complexity analysis and convergence analysis of our proposed REFCMFS method.

Iv-a Computational Analysis

Suppose we have samples in clusters and each sample has dimensions. For each iteration, the computational complexity of REFCMFS involves two steps. The first step is to compute the membership matrix , which has computational complexity . The second step is to calculate the centroid matrix , which needs to operations. For several public datasets , the computational complexity of REFCMFS for each iteration is . In addition, the computational complexities of other typical methods are listed in Table I, where denotes the computational complexity of Newton’s method used in RSFKM for each iteration. It can be seen that the complexity of REFCMFS is linear to and more suitable for handling the big dataset compared to GMM-based and graph-based methods.

Methods Complexity Methods Complexity
K-Means GMM
K-Means++ SC
K-Medoids RSFKM
FCM REFCMFS
TABLE I: The computational complexity of different methods.

Iv-B Convergence Analysis

To proof the convergence of the Algorithm 1, we need the Lemma 1 proposed in [36] to be used for the proof of Theorem 1. It can be listed as follows:

Lemma 1.

For any nonzero vectors ,, the following inequality holds:

(20)

where and denote the results at the -th and -th iterations, respectively.

Theorem 1.

The Algorithm 1 monotonically decreases the objective of the problem (6) in each iteration and converges to the global optimum.

Proof.

We decompose the problem (6) into two subproblems and utilize an alternately iterative optimization method to solve them.

According to [11], it is known that is convex on when or , where denotes the set of positive real numbers. For updating , with the centroid matrix fixed, the objective function of problem (11) is , where , and can be seen as a constant. Therefore, is convex on when and then is convex when .

For updating , with the membership matrix fixed, we use the Lemma 1 to analyze the lower bound. After iterations, there are and . Supposed that the updated and are the optimal solutions of problem (17), according to the definition of in equation (18), there is:

(21)

According to the Lemma 1, we can obtain:

(22)

Combining inequalities (21) and (22), we can obtain:

(23)

which means that the problem (16) has a lower bound. Thus, in each iteration, Algorithm 1 can monotonically decrease the objective function values of problem (6) until the algorithm converges. ∎

V Experiments

In this section, extensive experiments on several public datasets are conducted to evaluate the effectiveness of our proposed REFCMFS method.

V-a Experimental Setting

V-A1 Datasets

Several public datasets are used in our experiments which are described as follows.

ORL. This dataset [44] consists of 40 different subjects, 10 images per subject and each image is resized to 3232 pixels. The images are taken against a dark homogeneous background with the subjects in an upright, frontal position.

Yale. The dataset [7] contains 165 gray-scale images of 15 individuals. There are 11 images per subject, one per different facial expression or configuration, and each image is resized to 3232 pixels.

COIL20. The dataset [35] is constructed by 1440 gray-scale images of 20 objects (72 images per object). The size of each image is 32x 32 pixels, with 256 grey levels per pixel. The objects are placed on a motorized turntable against a black background and their Images are taken at pose intervals of 5 degrees.

USPS. The dataset [24] consists of 9298 gray-scale handwritten digit images and each image is 1616 gray-scale pixels. It is generated by an optical character recognition algorithm which is used to scan 5 digit ZIP Codes and converts them to digital digits.

YaleB. For this database [19], it has 38 individuals and around 64 near frontal images under different illuminations per individual. We simply use the cropped images and resize them to 3232 pixels.

COIL100. This dataset [34] consists of 7200 color images of 100 objects. Similar to COIL20 dataset, the objects are placed on a motorized turntable against a black background and their images are taken at pose intervals of 5 degrees corresponding to 72 images per object.

V-A2 Compared methods

We make comparisons between REFCMFS and several recent methods which are listed as follows. K-Means clustering (K-Means), Fuzzy C-Means clustering (FCM) [8], Spectral Clustering (SC) [48], and Gaussian Mixed Model (GMM) [10] are the baselines in our experiments. K-Means++ [4] and K-Medoids [40] are the variants of K-Means clustering, where K-Means++ uses a fast and simple sampling to seed the initial centers for K-Means and K-Medoids replaces the mean with the medoid to minimize the sum of dissimilarities between the center of a cluster and other cluster members. Landmark-based Spectral Clustering (LSC) [12] selects a few representative data points as the landmarks and represents the remaining data points as the linear combinations of these landmarks, where the spectral embedding of the data can then be efficiently computed with the landmark based representation, which can be applied to cluster the large-scale datasets. Robust and Sparse Fuzzy K-Means Clustering (RSFKM) [55] improves the membership matrix with proper sparsity balanced by a regularization parameter. Besides, we make a comparison between REFCMFS and its simplified version sim-REFCMFS which replaces the -norm loss of REFCMFS with the least square criteria loss.

V-A3 Evaluation Metrics

In our experiments, we adopt clustering accuracy (ACC) and normalized mutual information (NMI) as evaluation metrics. For these two metrics, the higher value indicates better clustering quality. Each metric penalizes or favors different properties in the clustering, and hence we report results on these two measures to perform a comprehensive evaluation.

Fig. 2: Parameters and sensitivity analyses of REFCMFS on ORL, Yale, COIL20, USPS, YaleB, and COIL100 datasets according to Clustering Accuracy.
Fig. 3: Parameters and sensitivity analyses of REFCMFS on ORL, Yale, COIL20, USPS, YaleB, and COIL100 datasets according to Clustering NMI.

ACC. Let be the clustering result and be the ground truth label of . ACC is defined as:

(24)

Here is the total number of samples. is the delta function that equals one if and equals zero otherwise. is the best mapping function that utilizes the Kuhn-Munkres algorithm to permute clustering labels to match the ground truth labels.

NMI. Supposing indicates the set of clusters obtained from the ground truth and indicates the set of clusters obtained from our algorithm. Their mutual information metric is defined as:

(25)

Here, and are the probabilities that an arbitrary sample belongs to the clusters and , respectively. is the joint probability that the arbitrarily selected sample belongs to both the clusters and . Here, the following normalized mutual information (NMI) is adopted:

(26)

where and are the entropies of and , respectively. Note that is ranged from 0 to 1. when the two sets of clusters are identical, and when they are independent.

Methods ORL Yale
ACC NMI ACC NMI
K-Means 48.601.40 71.281.37 42.916.18 49.663.41
K-Means++ 50.703.80 73.572.17 37.454.97 45.593.33
K-Medoids 42.102.90 63.031.84 37.583.64 43.513.46
GMM 53.855.40 75.532.48 40.614.24 48.273.80
SC 37.551.55 66.750.86 24.790.48 35.790.50
LSC 53.705.45 75.082.40 42.185.21 48.503.53
FCM 19.302.30 44.502.26 24.363.52 30.363.18
RSFKM 53.944.31 75.841.41 39.155.82 46.051.97
sim-REFCMFS 57.854.35 77.411.77 46.182.54 53.551.68
REFCMFS 60.502.95 78.412.06 47.882.43 54.161.57
TABLE II: Comparison results on ORL and Yale datasets in terms of ACC and NMI.
Methods COIL20 USPS
ACC NMI ACC NMI
K-Means 52.996.53 73.411.92 64.303.08 61.030.62
K-Means++ 57.745.60 75.792.52 64.052.35 60.790.54
K-Medoids 50.156.19 65.341.77 51.058.79 43.707.01
GMM 59.833.50 75.510.71 68.832.49 72.401.33
SC 57.833.25 75.661.18 25.980.06 10.180.07
LSC 59.764.21 73.433.29 62.424.23 58.392.16
FCM 23.854.62 41.313.88 37.792.45 29.712.68
RSFKM 65.767.99 76.122.53 67.380.01 61.680.01
sim-REFCMFS 68.565.13 76.362.39 67.566.94 61.371.62
REFCMFS 69.513.40 77.601.66 70.028.58 66.792.93
TABLE III: Comparison results on COIL20 and USPS datasets in terms of ACC and NMI.
Methods YaleB COIL100
ACC NMI ACC NMI
K-Means 9.360.00 12.340.00 48.212.62 75.690.67
K-Means++ 9.550.76 13.041.29 46.260.68 75.580.27
K-Medoids 6.680.37 8.250.20 31.590.68 63.100.50
GMM 9.650.30 13.570.39 43.544.63 75.921.30
SC 7.710.21 10.030.39 8.910.19 26.720.13
LSC 9.560.82 12.461.32 48.051.84 75.710.91
FCM 7.490.65 9.911.28 10.431.56 42.162.83
RSFKM 9.630.60 12.550.68 51.761.48 76.080.35
sim-REFCMFS 9.881.30 12.790.73 52.601.46 76.450.32
REFCMFS 10.040.47 13.610.61 53.151.84 77.820.74
TABLE IV: Comparison results on USPS, YaleB and COIL100 datasets in terms of ACC and NMI.
Methods Runtime (s)
ORL Yale COIL20 USPS YaleB COIL100
K-Means 0.0917 0.0544 0.2501 0.3435 0.3245 1.6119
K-Means++ 0.2262 0.0509 0.6452 1.3849 1.5643 12.7211
K-Medoids 0.0362 0.0290 0.1386 2.2490 0.3049 2.3005
GMM 6.7570 0.8921 178.5853 946.8664 298.7445 770.7133
SC 1.6823 0.5310 14.2997 150.3779 39.5985 351.5918
FCM 0.3669 0.1516 1.4901 0.9839 32.7830 18.4595
RSFKM 0.5985 0.2768 2.6506 17.1141 6.9165 25.0760
REFCMFS 0.3500 0.2395 1.6598 2.3491 6.3403 12.0973
TABLE V: Time complexities of different methods on ORL, Yale, COIL20, USPS, YaleB, and COIL100 datasets, respectively.

V-A4 Parameter Setup

There are two parameters and in our proposed REFCMFS method. The first one in problem (7) is utilized to adjust the number of nonzero elements in the membership vector . We search the optimal in the range of with different steps corresponding to different datasets. The second one in the problem (7) controls how fuzzy the cluster will be (the higher the fuzzier) and can be tuned by a grid-search strategy from to with step . We report different values of parameters and in Figures 2 and 3 to intuitively describe their sensitivity analyses of REFCMFS on different datasets, respectively, and record the best clustering results with optimal parameters.

It can be seen that each parameter plays an important role on the performance. Specifically, we set parameter and parameter for different datasets as follows: ORL , Yale , COIL20 , USPS , YaleB , and COIL100 . Taking the YaleB dataset as an example, the 3D bars of ACC and NMI simultaneously achieve the highest values when and .

Fig. 4: The membership values of each sample for K-Means, FCM, and REFCMFS on ORL, Yale, COIL20, USPS, and YaleB datasets, respectively.

V-B Experimental Results

In this section, we report the clustering performance comparisons of REFCMFS in Tables II IV and have the following observations.

Compared to four baselines K-Means, FCM, SC, and GMM, our proposed REFCMFS method and its simple version sim-REFCMFS generally achieve better performance on all the datasets. For instance, on the Yale dataset, REFCMFS obtains 4.735%, 23.66%, 20.73%, and 6.58% average improvements (For simplicity, the average improvement here is defined as the improvement averaged over two clustering evaluation metrics ACC and NMI.), respectively, compared to four baselines. Similarly, sim-REFCMFS gains 3.58%, 22.505%, 19.575%, and 5.425% average improvements, respectively. This observation indicates that it is beneficial to combine the advantages of hard partition and soft partition, to introduce the -norm robust loss, and to make the membership matrix with proper sparsity. This conclusion also can be demonstrated on other five datasets. Moreover, to intuitively present the flexible sparse membership values of REFCMFS with respect to those of the hard and soft partitions (i.e., K-Means and FCM), we show them in Figure 4, which can be seen that flexible sparsity is more beneficial to clustering.

Besides, compared to K-Means++ and K-Medoids (two variants of K-Means), REFCMFS and sim-REFCMFS obtain better results on all the datastes. Specifically, for the COIL20 dataset, REFCMFS achieves 6.79% and 15.81% average improvements and sim-REFCMFS gets 5.695% and 14.715% average improvements. It is obvious that although K-Means++ and K-Medoids improve K-Means in initialization they are not good at handling outliers because of the poor robustness of the least squares criterion. This conclusion also can be verified on other five datasets. Concretely, compared with K-Means++, REFCMFS achieves 7.32%, 9.5%, 5.985%, 0.53%, and 4.565% average improvements on ORL, Yale, USPS, YaleB, and COIL100 datasets, respectively, and sim-REFCMFS obtains 5.495%, 8.345%, 2.045%, 0.07% and 3.605% average improvements. Compared to K-Medoids, REFCMFS achieves 16.89%, 10.475%, 21.03%, 4.36% and 18.14% average improvements on ORL, Yale, USPS, YaleB, and COIL100 datasets, respectively, and sim-REFCMFS obtains 15.065%, 9.32%, 17.09%, 3.87% and 13.35% average improvements.

In addition, REFCMFS outperforms two recent works LSC and RSFKM on all the datasets. Considered that LSC needs to select a few representative data points as the landmarks and represents the remaining data points as the linear combinations of these landmarks, how to select the representative information directly affects on this method. Therefore, compared to LSC, REFCMFS achieves 5.065%, 5.68%, 6.96%, 8%, 0.815%, and 3.605% average improvements on ORL, Yale, COIL20, USPS, YaleB, and COIL100 datasets, respectively. RSFKM introduces a penalized regularization on the membership matrix and controls the sparsity of membership matrix by the regularization parameter, which differs in that REFCMFS efficiently adjusts the sparsity of membership matrix through its -norm constraint. Compared to RSFKM, REFCMFS achieves 4.565%, 8.42%, 2.615%, 3.875%, 0.735%, and 1.365% average improvements on ORL, Yale, COIL20, USPS, YaleB, and COIL100 datasets, respectively.

Furthermore, sim-REFCMFS is the simple version of REFCMFS, which achieves the second best performance on almost all datasets. Both sim-REFCMFS and REFCMFS prove that introducing the -norm constraint with flexible sparsity imposed on the membership matrix can result in better performance than other comparison methods. Whereas, the loss function of sim-REFCMFS is based on the least square criteria, not a robust loss based on the -norm, which may be sensitive to the outliers. Concretely, compared with sim-REFCMFS, REFCMFS achieves 1.825%, 1.155%, 1.095%, 3.94%, 0.49%, 0.96% average improvements on ORL, Yale, COIL20, USPS, YaleB, and COIL100 datasets, respectively.

Finally, Figure 5 shows the convergence curves and proves above convergence analysis of REFCMFS. What is more, combining the above computational complexity analysis in subsection I, we calculate the time complexities of different methods on all the datasets and report them in Table V. It is obvious REFCMFS is faster than RSFKM, GMM, and SC methods on all the datasets.

Fig. 5: The convergence curves of our proposed REFCMFS method on ORL, Yale, COIL20, USPS, YaleB, and COIL100 datasets, respectively.

Vi Conclusion

In this paper, we have proposed a novel clustering algorithm, named REFCMFS, which develops a -norm robust loss for the data-driven item and imposes a -norm constraint on the membership matrix to make the model more robust and sparse flexibly. This not only avoids the incorrect or invalid clustering partitions from outliers but also greatly reduces the computational complexity. Concretely, REFCMFS designs a new way to simplify and solve the -norm constraint directly without any approximate transformation by absorbing into the objective function through a ranking function. This make REFCMFS can be solved by a tractable and skillful optimization method and guarantees the optimality and convergence. Theoretical analyses and extensive experiments on several public datasets demonstrate the effectiveness and rationality of our proposed method.

References

  • [1] S. K. Adhikari, J. K. Sing, D. K. Basu, and M. Nasipuri (2015) Conditional spatial fuzzy c-means clustering algorithm for segmentation of mri images. ASC 34, pp. 758–769. Cited by: §I.
  • [2] J. Aparajeeta, P. K. Nanda, and N. Das (2016) Modified possibilistic fuzzy c-means algorithms for segmentation of magnetic resonance image. ASC 41, pp. 104–119. Cited by: §I.
  • [3] R. Arora, M. R. Gupta, A. Kapila, and M. Fazel (2013) Similarity-based clustering by left-stochastic matrix factorization. JMLR 14 (1), pp. 1715–1746. Cited by: §I.
  • [4] D. Arthur and S. Vassilvitskii (2007) K-means++: the advantages of careful seeding. In ACM-SIAM, Cited by: §V-A2.
  • [5] S. Askari, N. Montazerin, M. F. Zarandi, and E. Hakimi (2017) Generalized entropy based possibilistic fuzzy c-means for clustering noisy data and its convergence proof. Neurocomputing 219, pp. 186–202. Cited by: §I.
  • [6] C. Bauckhage (2015) K-means clustering is matrix factorization. arXiv preprint arXiv:1512.07548. Cited by: §I.
  • [7] P. N. Belhumeur, J. P. Hespanha, and D. J. Kriegman (1997) Eigenfaces vs. fisherfaces: recognition using class specific linear projection. Technical report Yale University New Haven United States. Cited by: §V-A1.
  • [8] J. C. Bezdek (1980) A convergence theorem for teh fuzzy isodata clustering algorithms. TPAMI (1), pp. 1–8. Cited by: §II-B, §V-A2.
  • [9] S. S. Bhowmick and B. S. Seah (2016) Clustering and summarizing protein-protein interaction networks: a survey. TKDE 28 (3), pp. 638–658. Cited by: §I.
  • [10] C. M. Bishop (2006) Pattern recognition and machine learning. springer. Cited by: §II-C, §V-A2.
  • [11] S. Boyd and L. Vandenberghe (2004) Convex optimization. Cambridge university press. Cited by: §IV-B.
  • [12] X. Chen and D. Cai (2011) Large scale spectral clustering with landmark-based representation. In AAAI, Cited by: §V-A2.
  • [13] F. R. Chung and F. C. Graham (1997) Spectral graph theory. American Mathematical Soc.. Cited by: §II-D.
  • [14] P. D’Urso, L. De Giovanni, and R. Massari (2015) Time series clustering by a robust autoregressive metric with application to air pollution. CILS 141, pp. 107–124. Cited by: §I.
  • [15] Y. Ding and X. Fu (2016) Kernel-based fuzzy c-means clustering algorithm based on genetic algorithm. Neurocomputing 188, pp. 233–238. Cited by: §I.
  • [16] (2010) Fast global k-means clustering using cluster membership and inequality. Pattern Recognition 43 (5), pp. 1954–1963. Cited by: §I.
  • [17] (2011) Fast modified global k-means algorithm for incremental cluster construction. Pattern Recognition 44 (4), pp. 866–876. Cited by: §I.
  • [18] G. Gan and M. K. Ng (2017) K-means clustering with outlier removal. PRL 90, pp. 8–14. Cited by: §I.
  • [19] A. S. Georghiades, P. N. Belhumeur, and D. J. Kriegman (2001) From few to many: illumination cone models for face recognition under variable lighting and pose. TPAMI 23 (6), pp. 643–660. Cited by: §V-A1.
  • [20] A. Georgogiannis Robust k-means: a theoretical revisit. In Advances in Neural Information Processing Systems, Cited by: §I.
  • [21] B. Han, L. Liu, and E. Omiecinski (2017) A systematic approach to clustering whole trajectories of mobile objects in road networks. TKDE 29 (5), pp. 936–949. Cited by: §I.
  • [22] V. Hautamáki, S. Cherednichenko, I. Kárkkáinen, T. Kinnunen, and P. Fránti (2005) Improving k-means by outlier removal. In SCIA, Cited by: §I.
  • [23] Z. He, X. Xu, and S. Deng (2003) Discovering cluster-based local outliers. PRL 24 (9-10), pp. 1641–1650. Cited by: §I.
  • [24] J. J. Hull (1994) A database for handwritten text recognition research. TPAMI 16 (5), pp. 550–554. Cited by: §V-A1.
  • [25] H. Ismkhan (2018) I-k-means-+: an iterative clustering algorithm based on an enhanced version of the k-means. Pattern Recognition 79, pp. 402–413. Cited by: §I.
  • [26] H. Izakian, W. Pedrycz, and I. Jamal (2013) Clustering spatiotemporal data: an augmented fuzzy c-means. TFS 21 (5), pp. 855–868. Cited by: §I.
  • [27] F. Jiang, G. Liu, J. Du, and Y. Sui (2016) Initialization of k-modes clustering using outlier detection techniques. IS 332, pp. 167–183. Cited by: §I.
  • [28] M. Jiang, S. Tseng, and C. Su (2001) Two-phase clustering process for outliers detection. PRL 22 (6-7), pp. 691–700. Cited by: §I.
  • [29] S. Jiang and Q. An (2008) Clustering-based outlier detection method. In FSKD, Vol. 2, pp. 429–433. Cited by: §I.
  • [30] W. Jiang, F. Nie, and H. Huang (2015) Robust dictionary learning wif capped -norm. In IJCAI, Cited by: §II-E.
  • [31] B. S. Y. Lam and Y. Hong (2004) Robust clustering algorithm for suppression of outliers [data classification applications]. In International Symposium on Intelligent Multimedia, Video and Speech Processing, pp. 691–694. Cited by: §I.
  • [32] T. Lei, X. Jia, Y. Zhang, L. He, H. Meng, and A. K. Nandi (2018) Significantly fast and robust fuzzy c-means clustering algorithm based on morphological reconstruction and membership filtering. TFS. Cited by: §I.
  • [33] (2008) Modified global k-means algorithm for minimum sum-of-squares clustering problems. Pattern Recognition 41 (10), pp. 3192–3199. Cited by: §I.
  • [34] S. Nayar, S. A. Nene, and H. Murase (1996) Columbia object image library (coil 100). department of comp. Science, Columbia University, Tech. Rep. CUCS-006-96. Cited by: §V-A1.
  • [35] S. A. Nene, S. K. Nayar, H. Murase, et al. (1996) Columbia object image library (coil-20). Cited by: §V-A1.
  • [36] F. Nie, H. Huang, X. Cai, and C. H. Ding (2010) Efficient and robust feature selection via joint -norms minimization. In NIPS, Cited by: §II-E, §IV-B.
  • [37] N. Nithya, K. Duraiswamy, and P. Gomathy (2013) A survey on clustering techniques in medical diagnosis. IJCST 1 (2), pp. 17–23. Cited by: §I.
  • [38] L. Ott, L. Pang, F. T. Ramos, and S. Chawla (2014) On integrated clustering and outlier detection. In NIPS, Cited by: §I.
  • [39] R. Pamula, J. K. Deka, and S. Nandi (2011) An outlier detection method based on clustering. In EAIT, Cited by: §I.
  • [40] H. Park and C. Jun (2009) A simple and fast algorithm for k-medoids clustering. Expert systems with applications 36 (2), pp. 3336–3341. Cited by: §V-A2.
  • [41] B. A. Pimentel and R. M. de Souza (2016) Multivariate fuzzy c-means algorithms with weighting. Neurocomputing 174, pp. 946–965. Cited by: §I.
  • [42] X. Qiu, Y. Qiu, G. Feng, and P. Li (2015) A sparse fuzzy c-means algorithm based on sparse clustering framework. Neurocomputing 157, pp. 290–295. Cited by: §I.
  • [43] F. Rehm, F. Klawonn, and R. Kruse (2007) A novel approach to noise clustering for outlier detection. SC 11 (5), pp. 489–494. Cited by: §I.
  • [44] F. S. Samaria and A. C. Harter (1994) Parameterisation of a stochastic model for human face identification. In ACV, pp. 138–142. Cited by: §V-A1.
  • [45] L. Szilágyi and S. M. Szilágyi (2014) Generalization rules for the suppressed fuzzy c-means clustering algorithm. Neurocomputing 139, pp. 298–309. Cited by: §I.
  • [46] (2003) The global k-means clustering algorithm. Pattern Recognition 36 (2), pp. 451–461. Cited by: §I.
  • [47] G. Tzortzis, A. Likas, and G. Tzortzis (2014) The minmax k-means clustering algorithm. Pattern Recognition 47 (7), pp. 2505–2516. Cited by: §I.
  • [48] U. Von Luxburg (2007) A tutorial on spectral clustering. SC 17 (4), pp. 395–416. Cited by: §V-A2.
  • [49] J. Wang, J. Wang, J. Song, X. Xu, H. T. Shen, and S. Li (2015) Optimized cartesian k-means. TKDE 27 (1), pp. 180–192. Cited by: §I.
  • [50] J. Wang, S. Yan, Y. Yang, M. S. Kankanhalli, S. Li, and J. Wang (2015) Group k-means. arXiv preprint arXiv:1501.00825. Cited by: §I.
  • [51] J. J. Whang, I. S. Dhillon, and D. F. Gleich (2015) Non-exhaustive, overlapping k-means. In ICDM, Cited by: §I.
  • [52] C. Wiwie, J. Baumbach, and R. Róttger (2015) Comparing the performance of biomedical clustering methods. NM 12 (11), pp. 1033. Cited by: §I.
  • [53] D. Wu, J. Shi, and N. Mamoulis (2018) Density-based place clustering using geo-social network data. TKDE 30 (5), pp. 838–851. Cited by: §I.
  • [54] J. Wu, H. Liu, H. Xiong, J. Cao, and J. Chen (2015) K-means-based consensus clustering: a unified view. TKDE 27 (1), pp. 155–169. Cited by: §I.
  • [55] J. Xu, J. Han, K. Xiong, and F. Nie (2016) Robust and sparse fuzzy k-means clustering.. In IJCAI, Cited by: §I, §II-E, §V-A2.
  • [56] M. S. Yang and Y. Nataliani (2017) Robust-learning fuzzy c-means clustering algorithm with unknown number of clusters. Pattern Recognition 71, pp. 45–59. Cited by: §I.
  • [57] K. Zhang, M. Hutter, and H. Jin (2009) A new local distance-based outlier detection approach for scattered real-world data. In PAKDD, Cited by: §I.
  • [58] Y. Zhou, H. Yu, and X. Cai (2009) A novel k-means algorithm for clustering and outlier detection. In FITME, Cited by: §I.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
388214
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description