Robust and Efficient Fuzzy CMeans Clustering Constrained on Flexible Sparsity
Abstract
Clustering is an effective technique in data mining to group a set of objects in terms of some attributes. Among various clustering approaches, the family of KMeans algorithms gains popularity due to simplicity and efficiency. However, most of existing KMeans based clustering algorithms cannot deal with outliers well and are difficult to efficiently solve the problem embedded the norm constraint. To address the above issues and improve the performance of clustering significantly, we propose a novel clustering algorithm, named REFCMFS, which develops a norm robust loss as the datadriven item and imposes a norm constraint on the membership matrix to make the model more robust and sparse flexibly. In particular, REFCMFS designs a new way to simplify and solve the norm constraint without any approximate transformation by absorbing into the objective function through a ranking function. These improvements not only make REFCMFS efficiently obtain more promising performance but also provide a new tractable and skillful optimization method to solve the problem embedded the norm constraint. Theoretical analyses and extensive experiments on several public datasets demonstrate the effectiveness and rationality of our proposed REFCMFS method.
I Introduction
As a fundamental problem in machine learning, clustering is widely used for many fields, such as the network data (including ProteinProtein Interaction Networks [9], Road Networks [21], GeoSocial Network [53]), medical diagnosis [37], biological data analysis [52], environmental chemistry [14] and so on. KMeans clustering is one of the most popular techniques because of its simplicity and effectiveness, which randomly initializes the cluster centroids, assigns each sample to its nearest cluster and then updates cluster centroid iteratively to cluster a dataset into some subsets.
Over the past years, many modified versions of KMeans algorithms have been proposed, such as Global KMeans [46] and its variants [33, 16, 17], MinMax KMeans clustering [47], KMeans based Consensus clustering [54], Optimized Cartesian KMeans [49], Group KMeans [50], Robust KMeans [20], IKMeans+ [25] and so on. Most importantly, researchers have pointed out that the objective function of KMeans clustering can be expressed as the Frobenius norm of the difference between the data matrix and the low rank approximation of that data matrix [3, 6]. Specifically, the problem of hard KMeans clustering is as follows:
(1) 
where is a matrix of data vectors ; is a matrix of cluster centroids ; is a cluster indicator matrix of binary variables such that if where denotes the th cluster and otherwise .
Although the KMeans clustering algorithm has been used widely, it is sensitive to the outliers, which easily deteriorate the clustering performance. Therefore, two main approaches are proposed to deal with the outliers in KMeans clustering: one based on outlier analysis (outlier detection or removal), and the other one based on outlier suppression (robust model).
For the first one, much work has been done on outlier analysis. Several algorithms[28, 23, 22, 29, 58, 39, 27] perform clustering and outlier detection in stages, where a dataset is divided into different clusters and some measures are calculated for the data points based on the clusters to identify outliers. Besides, [43] defines outliers in terms of noise distance, where the data points that are about the noise distance or farther away from any other cluster centers get high membership degrees to the outlier cluster. [57] introduces a local distancebased outlier factor to measure the outlierness of objects in scattered datasets. [38] extends the facility location formulation to model the joint clustering and outlier detection problem and proposes a subgradientbased algorithm to solve the resulting optimization problem. [51] proposes a nonexhaustive overlapping KMeans algorithm to identify outliers during the clustering process. [18] provides the data clustering and outlier detection simultaneously by introducing an additional “cluster” into the KMeans algorithm to hold all outliers. For the second one, the main strategy of outlier suppression is to modify the objective functions as a robust model, such as [31].
Fuzzy CMeans (FCM) is a simple but powerful clustering method by introducing the concept of fuzzy sets that has been successfully used in many areas. There are, however, several wellknown problems with FCM, such as sensitivity to initialization, sensitivity to outliers, and limitation to convex clusters. Therefore, many extensions and variants of FCM clustering are advanced in recent years. Augmented FCM [26] revisits and augments the algorithm to make it applicable to spatiotemporal data. Suppressed FCM [45] is proposed to increase the difference between high and low membership grades, which gives more accurate partitions of the data with less iterations compared to the FCM. Sparse FCM [42] reforms traditional FCM to deal with high dimensional data clustering, based on Witten’s sparse clustering framework. Kernel based FCM [15] optimizes FCM, based on the genetic algorithm optimization which is combined of the improved genetic algorithm and the kernel technique to optimize the initial clustering center firstly and to guide the categorization. Multivariate FCM [41] proposes two multivariate FCM algorithms with different weights aiming to represent how important each different variable is for each cluster and to improve the clustering quality. RobustLearning FCM [56] is free of the fuzziness index and initializations without parameter selection, and can also automatically find the best number of clusters.
Since the above extensions still has weak performance when dealing with outliers, several robust FCM algorithms come out. Specifically, conditional spatial FCM [1] improves the robustness of FCM through the incorporation of conditioning effects imposed by an auxiliary variable corresponding to each pixel. Modified possibilistic FCM [2] jointly considers the typicality as well as the fuzzy membership measures to model the bias field and noise. Generalized entropybased possibilistic FCM [5] utilizes functions of distance instead of the distance itself in the fuzzy, possibilistic, and entropy terms of the clustering objective function to decrease noise contributions on the cluster centers. Fast and robust FCM[32] proposes a significantly faster and more robust based on the morphological reconstruction and membership filtering.
Inspired by above analysis, we develop a novel clustering method in this work, named as Robust and Efficient Fuzzy CMeans Clustering constrained on Flexible Sparsity (REFCMFS), by introducing a flexible sparse constraint imposed on the membership matrix to improve the robustness of the proposed method and to provide a new idea to simplify solving the problem of sparse constraint on norm. The proposed method REFCMFS not only improves the robustness from the norm datadriven term but also obtains the solution with proper sparsity and greatly reduces the computational complexity.
Note that we have proposed a Robust and Sparse Fuzzy KMeans (RSFKM) clustering method [55] recently. However, our proposed REFCMFS method in this paper is quite different from RSFCM. Concretely, RSFCM takes into account the robustness of the datadriven term by utilizing norm and capped norm, and utilizes the Lagrangian Multiplier method and Newton method to solve the membership matrix whose sparseness is adjusted by the regularized parameter. In contrast, our proposed REFCMFS method maintains the robustness of the clustering model by using the norm loss and introduces the sparse constraint norm imposed on the membership matrix with a flexible sparsity, i.e., where denotes the number of nonzero elements in . It is well known that solving the problem with norm constraint is difficult. The proposed REFCMFS method absorbs this constraint into the objective function by designing a novel ranking function , which is an efficient way to calculate the optimal membership matrix and greatly reduces the computational complexity, especially for a large dataset. The related theoretical analyses and comparison experiments can be demonstrated in Sections IV and V.
The contributions of our proposed REFCMFS method in this paper can be summarized as follows:

REFCMFS develops the norm loss for the datadriven item and introduces the norm constraint on the membership matrix, which makes the model have the abilities of robustness, proper sparseness, and better interpretability. This not only avoids the incorrect or invalid clustering partitions from outliers but also greatly reduces the computational complexity.

REFCMFS designs a new way to simplify and solve the norm constraint directly without any approximation. For each instance, we absorb into the objective function through a ranking function which sorts elements in ascending order and selects first smallest elements as well as corresponding membership values and sets the rest of membership values as zeros. This makes REFCMFS can be solved by a tractable and skillful optimization method and guarantees the optimality and convergence.

Theoretical analyses, including the complexity analysis and convergence analysis, are presented briefly, and extensive experiments on several public datasets demonstrate the effectiveness and rationality of the proposed REFCMFS method.
The rest of this paper is organized as follows. The related works are introduced in Section 2. In Section 3, we develop a novel REFCMFS method and provide a new idea to solve it masterly. Some theoretical analyses of REFCMFS, i.e., complexity analysis and convergence analysis, are shown in Section 4. Section 5 provides the experimental results on several public datasets, followed by convergence curves and parameter sensitivity analyses of experiments. The conclusion is shown in Section 6.
Ii Preliminary Knowledge
In this section, we briefly review some typical literature on KMeans Clustering, Gaussian Mixed Model, Spectral Clustering, Fuzzy CMeans Clustering, and Robust and Sparse Fuzzy KMeans Clustering related to the proposed methods.
Iia KMeans Clustering
The KMeans clustering has been shown in problem (1), where is the membership matrix and each row of satisfies the of coding scheme (if a data point is assigned to the th cluster then and otherwise). Although KMeans Clustering is simple and can be solved efficiently, it is very sensitive to outliers.
IiB Fuzzy CMeans Clustering
As one of the most popular fuzzy clustering techniques, Fuzzy CMeans Clustering [8] is to minimize the following objective function:
(2) 
where is the membership matrix and whose elements are nonnegative and their sum equals to one on each row. The parameter is a weighting exponent on each fuzzy membership and determines the amount of fuzziness of the resulting clustering.
The objective functions of KMeans and FCM are virtually identical, and the only difference is to introduce a vector (i.e., each row of ) which expresses the percentage of belonging of a given point to each of the clusters. This vector is submitted to a ’stiffness’ exponent (i.e., ) aimed at providing more importance to the stronger connections (and conversely at minimizing the weight of weaker ones). When tends towards infinity, the resulting vector becomes a binary vector, hence making the objective function of FCM identical to that of KMeans. Besides, FCM tends to run slower than KMeans since each point is evaluated with each cluster, and more operations are involved in each evaluation. KMeans only needs to do a distance calculation, whereas FCM needs to do a full inversedistance weighting.
IiC Gaussian Mixed Model
Unlike KMeans clustering which generates hard partitions of data, Gaussian Mixed Model (GMM) [10] as one of the most widely used mixture models for clustering can generate soft partition and is more flexible. Considering that each cluster can be mathematically represented by a parametric distribution, the entire dataset is modeled by a mixture of these distributions. In GMM, the mixture of Gaussians has component densities mixed together with mixing coefficients :
(3) 
where are parameters such that and each is a Gaussian density function parameterized by . GMM use mixture distributions to fit the data and the conditional probabilities of data points, , are used to assign probabilistic labels. Although the ExpectationMaximization (EM) algorithm for GMM can achieve the promising results, it has a high computational complexity.
IiD Spectral Clustering
The general Spectral Clustering (SC) method [13] needs to construct an adjacency matrix and calculate the eigendecomposition of the corresponding Laplacian matrix. However, both of these two steps are computational expensive. Given a data matrix , spectral clustering first constructs an undirected graph by its adjacency matrix , each element of which denotes the similarity between and . The graph Laplacian is calculated where denotes the degree matrix which is a diagonal matrix whose entries are row sums of , i.e., . Then spectral clustering use the top eigenvectors of corresponding to the smallest eigenvalues as the low dimensional representations of the original data. Finally, the traditional KMeans clustering is applied to obtained the clusters. Due to the high complexity of the graph construction and the eigendecomposition, spectral clustering is not suitable to deal with the largescale applications.
IiE Robust and Sparse Fuzzy KMeans Clustering
Considering that the norm loss imposed in problems (1) and (2) lacks of robustness, with the development of norm [36, 30] technologies, amount of robust loss functions are designed and shown their empirical successes in various applications. For example, the recent work [55] provided a robust and sparse Fuzzy KMeans clustering by introducing two robust loss functions (i.e., norm and capped norm) and a penalized regularization on membership matrix. Its objective functions can be written as:
(4) 
where
(5) 
where is the membership matrix and is the regularization parameter, and is a threshold. When is zero, the membership vector of each sample becomes extremely sparse (only one element is nonzero and others are zero). The membership matrix equals to the binary clustering indicator matrix, which is hard KMeans clustering. With the gradual increase of , membership vector contains a growing number of nonzero elements. When becomes a large value, all elements in membership vectors are all nonzero, which is equivalent to FCM clustering.
Iii Robust and Efficient FCM Clustering Constrained on Flexible Sparsity
In this section, we introduce our proposed REFCMFS method, which develops the norm loss for the datadriven item and imposes the norm constraint on the membership matrix to make the model more robust and sparse flexibly. We also design a new way to simplify and solve the norm loss with the norm constraint efficiently without any approximation.
Iiia Formulation
Based on the Fuzzy CMeans Clustering algorithm, in order to make the model more robust, proper sparse, and efficient during the clustering, we propose the following objective function:
(6) 
where is the membership matrix constrained by the norm and is the hyperparameter that controls how fuzzy the cluster will be (the higher, the fuzzier) and denotes the number of nonzero elements in , which constrains the sparseness of membership matrix to be .
We find that constrains the number of all the elements of not the number of elements of each , where is the th row of membership matrix and responds to the membership vector of the th sample. This easily leads to two extreme cases for , i.e., and , where makes the soft partition degrade into the hard partition and results in an invalid partition for the th sample because all the membership values are equal. Therefore, we further divide the problem (6) into subproblems and impose the norm constraint on the membership vector for each sample. Therefore, REFCMFS can be presented as follows:
(7) 
where denotes the number of nonzero elements in , , and .
It is obvious that achieves the robustness by using the norm on the similarity between and , and makes the membership vector with the sparsity , which not only avoids the incorrect or invalid clustering partitions from outliers but also greatly reduces the computational complexity.
IiiB Optimization
In this subsection, we provide an efficient iterative method to solve problem (7). More specifically, we alternatively update one optimization variable while keeping other optimization variables fixed. It is represented as follows.
Step 1: Solving while fixing
With the centroid matrix fixed, the problem (7) becomes:
(8) 
Due to directly solve the problem (8) difficultly, we need to do some transformations as follows:
(9) 
where and is a rowvector contained different . To efficiently minimize the problem (9), we define a ranking function and perform it on , and then obtain:
(10) 
where sorts elements of in ascending order and is the corresponding permutation matrix which results in permuting columns of along the order . Based on equation (10), we select first smallest elements as well as their corresponding membership values in , meanwhile, setting the membership values of the rest elements as zeros, i.e., . Intuitively, we present the above operations in Figure 1.
Therefore, the problem (9) is equivalent to the following problem by absorbing the norm constraint into the objective function:
(11) 
By using the Lagrangian Multiplier method, the Lagrangian function of problem (11) is:
(12) 
where is the Lagrangian multiplier. To solve the minimum of problem (12), we take the derivatives of with respect to and , respectively, and set them to zeros. We obtain the optimal solution of problem (11):
(13) 
where .
Substituting the equation (13) into problem (11), its optimal value arrives at:
(14) 
It is obvious that the minimum depends on , the smaller the better.
Therefore, the optimal solution of problem (9) is:
(15) 
Step 2: Solving while fixing
With the membership matrix fixed, the problem (7) becomes:
(16) 
which can be solved by introducing a nonnegative auxiliary variable and using the iterative reweighted method. Thus, we rewrite the problem (16) as:
(17) 
where
(18) 
The optimal solution of problem (17) can be reached by taking derivative and setting it to zero. That is:
(19) 
Assuming that and are computed at the th iteration, we can update the nonnegative auxiliary variable according to equation (18) by current . Intuitively, the above optimization is summarized in Algorithm 1.
Iv Theoretical Analysis
In this section, we provide computational complexity analysis and convergence analysis of our proposed REFCMFS method.
Iva Computational Analysis
Suppose we have samples in clusters and each sample has dimensions. For each iteration, the computational complexity of REFCMFS involves two steps. The first step is to compute the membership matrix , which has computational complexity . The second step is to calculate the centroid matrix , which needs to operations. For several public datasets , the computational complexity of REFCMFS for each iteration is . In addition, the computational complexities of other typical methods are listed in Table I, where denotes the computational complexity of Newton’s method used in RSFKM for each iteration. It can be seen that the complexity of REFCMFS is linear to and more suitable for handling the big dataset compared to GMMbased and graphbased methods.
Methods  Complexity  Methods  Complexity 

KMeans  GMM  
KMeans++  SC  
KMedoids  RSFKM  
FCM  REFCMFS 
IvB Convergence Analysis
To proof the convergence of the Algorithm 1, we need the Lemma 1 proposed in [36] to be used for the proof of Theorem 1. It can be listed as follows:
Lemma 1.
For any nonzero vectors ,, the following inequality holds:
(20) 
where and denote the results at the th and th iterations, respectively.
Theorem 1.
Proof.
We decompose the problem (6) into two subproblems and utilize an alternately iterative optimization method to solve them.
According to [11], it is known that is convex on when or , where denotes the set of positive real numbers. For updating , with the centroid matrix fixed, the objective function of problem (11) is , where , and can be seen as a constant. Therefore, is convex on when and then is convex when .
For updating , with the membership matrix fixed, we use the Lemma 1 to analyze the lower bound. After iterations, there are and . Supposed that the updated and are the optimal solutions of problem (17), according to the definition of in equation (18), there is:
(21) 
According to the Lemma 1, we can obtain:
(22) 
Combining inequalities (21) and (22), we can obtain:
(23) 
which means that the problem (16) has a lower bound. Thus, in each iteration, Algorithm 1 can monotonically decrease the objective function values of problem (6) until the algorithm converges. ∎
V Experiments
In this section, extensive experiments on several public datasets are conducted to evaluate the effectiveness of our proposed REFCMFS method.
Va Experimental Setting
VA1 Datasets
Several public datasets are used in our experiments which are described as follows.
ORL. This dataset [44] consists of 40 different subjects, 10 images per subject and each image is resized to 3232 pixels. The images are taken against a dark homogeneous background with the subjects in an upright, frontal position.
Yale. The dataset [7] contains 165 grayscale images of 15 individuals. There are 11 images per subject, one per different facial expression or configuration, and each image is resized to 3232 pixels.
COIL20. The dataset [35] is constructed by 1440 grayscale images of 20 objects (72 images per object). The size of each image is 32x 32 pixels, with 256 grey levels per pixel. The objects are placed on a motorized turntable against a black background and their Images are taken at pose intervals of 5 degrees.
USPS. The dataset [24] consists of 9298 grayscale handwritten digit images and each image is 1616 grayscale pixels. It is generated by an optical character recognition algorithm which is used to scan 5 digit ZIP Codes and converts them to digital digits.
YaleB. For this database [19], it has 38 individuals and around 64 near frontal images under different illuminations per individual. We simply use the cropped images and resize them to 3232 pixels.
COIL100. This dataset [34] consists of 7200 color images of 100 objects. Similar to COIL20 dataset, the objects are placed on a motorized turntable against a black background and their images are taken at pose intervals of 5 degrees corresponding to 72 images per object.
VA2 Compared methods
We make comparisons between REFCMFS and several recent methods which are listed as follows. KMeans clustering (KMeans), Fuzzy CMeans clustering (FCM) [8], Spectral Clustering (SC) [48], and Gaussian Mixed Model (GMM) [10] are the baselines in our experiments. KMeans++ [4] and KMedoids [40] are the variants of KMeans clustering, where KMeans++ uses a fast and simple sampling to seed the initial centers for KMeans and KMedoids replaces the mean with the medoid to minimize the sum of dissimilarities between the center of a cluster and other cluster members. Landmarkbased Spectral Clustering (LSC) [12] selects a few representative data points as the landmarks and represents the remaining data points as the linear combinations of these landmarks, where the spectral embedding of the data can then be efficiently computed with the landmark based representation, which can be applied to cluster the largescale datasets. Robust and Sparse Fuzzy KMeans Clustering (RSFKM) [55] improves the membership matrix with proper sparsity balanced by a regularization parameter. Besides, we make a comparison between REFCMFS and its simplified version simREFCMFS which replaces the norm loss of REFCMFS with the least square criteria loss.
VA3 Evaluation Metrics
In our experiments, we adopt clustering accuracy (ACC) and normalized mutual information (NMI) as evaluation metrics. For these two metrics, the higher value indicates better clustering quality. Each metric penalizes or favors different properties in the clustering, and hence we report results on these two measures to perform a comprehensive evaluation.
ACC. Let be the clustering result and be the ground truth label of . ACC is defined as:
(24) 
Here is the total number of samples. is the delta function that equals one if and equals zero otherwise. is the best mapping function that utilizes the KuhnMunkres algorithm to permute clustering labels to match the ground truth labels.
NMI. Supposing indicates the set of clusters obtained from the ground truth and indicates the set of clusters obtained from our algorithm. Their mutual information metric is defined as:
(25) 
Here, and are the probabilities that an arbitrary sample belongs to the clusters and , respectively. is the joint probability that the arbitrarily selected sample belongs to both the clusters and . Here, the following normalized mutual information (NMI) is adopted:
(26) 
where and are the entropies of and , respectively. Note that is ranged from 0 to 1. when the two sets of clusters are identical, and when they are independent.
Methods  ORL  Yale  

ACC  NMI  ACC  NMI  
KMeans  48.601.40  71.281.37  42.916.18  49.663.41 
KMeans++  50.703.80  73.572.17  37.454.97  45.593.33 
KMedoids  42.102.90  63.031.84  37.583.64  43.513.46 
GMM  53.855.40  75.532.48  40.614.24  48.273.80 
SC  37.551.55  66.750.86  24.790.48  35.790.50 
LSC  53.705.45  75.082.40  42.185.21  48.503.53 
FCM  19.302.30  44.502.26  24.363.52  30.363.18 
RSFKM  53.944.31  75.841.41  39.155.82  46.051.97 
simREFCMFS  57.854.35  77.411.77  46.182.54  53.551.68 
REFCMFS  60.502.95  78.412.06  47.882.43  54.161.57 
Methods  COIL20  USPS  

ACC  NMI  ACC  NMI  
KMeans  52.996.53  73.411.92  64.303.08  61.030.62 
KMeans++  57.745.60  75.792.52  64.052.35  60.790.54 
KMedoids  50.156.19  65.341.77  51.058.79  43.707.01 
GMM  59.833.50  75.510.71  68.832.49  72.401.33 
SC  57.833.25  75.661.18  25.980.06  10.180.07 
LSC  59.764.21  73.433.29  62.424.23  58.392.16 
FCM  23.854.62  41.313.88  37.792.45  29.712.68 
RSFKM  65.767.99  76.122.53  67.380.01  61.680.01 
simREFCMFS  68.565.13  76.362.39  67.566.94  61.371.62 
REFCMFS  69.513.40  77.601.66  70.028.58  66.792.93 
Methods  YaleB  COIL100  

ACC  NMI  ACC  NMI  
KMeans  9.360.00  12.340.00  48.212.62  75.690.67 
KMeans++  9.550.76  13.041.29  46.260.68  75.580.27 
KMedoids  6.680.37  8.250.20  31.590.68  63.100.50 
GMM  9.650.30  13.570.39  43.544.63  75.921.30 
SC  7.710.21  10.030.39  8.910.19  26.720.13 
LSC  9.560.82  12.461.32  48.051.84  75.710.91 
FCM  7.490.65  9.911.28  10.431.56  42.162.83 
RSFKM  9.630.60  12.550.68  51.761.48  76.080.35 
simREFCMFS  9.881.30  12.790.73  52.601.46  76.450.32 
REFCMFS  10.040.47  13.610.61  53.151.84  77.820.74 
Methods  Runtime (s)  

ORL  Yale  COIL20  USPS  YaleB  COIL100  
KMeans  0.0917  0.0544  0.2501  0.3435  0.3245  1.6119 
KMeans++  0.2262  0.0509  0.6452  1.3849  1.5643  12.7211 
KMedoids  0.0362  0.0290  0.1386  2.2490  0.3049  2.3005 
GMM  6.7570  0.8921  178.5853  946.8664  298.7445  770.7133 
SC  1.6823  0.5310  14.2997  150.3779  39.5985  351.5918 
FCM  0.3669  0.1516  1.4901  0.9839  32.7830  18.4595 
RSFKM  0.5985  0.2768  2.6506  17.1141  6.9165  25.0760 
REFCMFS  0.3500  0.2395  1.6598  2.3491  6.3403  12.0973 
VA4 Parameter Setup
There are two parameters and in our proposed REFCMFS method. The first one in problem (7) is utilized to adjust the number of nonzero elements in the membership vector . We search the optimal in the range of with different steps corresponding to different datasets. The second one in the problem (7) controls how fuzzy the cluster will be (the higher the fuzzier) and can be tuned by a gridsearch strategy from to with step . We report different values of parameters and in Figures 2 and 3 to intuitively describe their sensitivity analyses of REFCMFS on different datasets, respectively, and record the best clustering results with optimal parameters.
It can be seen that each parameter plays an important role on the performance. Specifically, we set parameter and parameter for different datasets as follows: ORL , Yale , COIL20 , USPS , YaleB , and COIL100 . Taking the YaleB dataset as an example, the 3D bars of ACC and NMI simultaneously achieve the highest values when and .
VB Experimental Results
In this section, we report the clustering performance comparisons of REFCMFS in Tables II IV and have the following observations.
Compared to four baselines KMeans, FCM, SC, and GMM, our proposed REFCMFS method and its simple version simREFCMFS generally achieve better performance on all the datasets. For instance, on the Yale dataset, REFCMFS obtains 4.735%, 23.66%, 20.73%, and 6.58% average improvements (For simplicity, the average improvement here is defined as the improvement averaged over two clustering evaluation metrics ACC and NMI.), respectively, compared to four baselines. Similarly, simREFCMFS gains 3.58%, 22.505%, 19.575%, and 5.425% average improvements, respectively. This observation indicates that it is beneficial to combine the advantages of hard partition and soft partition, to introduce the norm robust loss, and to make the membership matrix with proper sparsity. This conclusion also can be demonstrated on other five datasets. Moreover, to intuitively present the flexible sparse membership values of REFCMFS with respect to those of the hard and soft partitions (i.e., KMeans and FCM), we show them in Figure 4, which can be seen that flexible sparsity is more beneficial to clustering.
Besides, compared to KMeans++ and KMedoids (two variants of KMeans), REFCMFS and simREFCMFS obtain better results on all the datastes. Specifically, for the COIL20 dataset, REFCMFS achieves 6.79% and 15.81% average improvements and simREFCMFS gets 5.695% and 14.715% average improvements. It is obvious that although KMeans++ and KMedoids improve KMeans in initialization they are not good at handling outliers because of the poor robustness of the least squares criterion. This conclusion also can be verified on other five datasets. Concretely, compared with KMeans++, REFCMFS achieves 7.32%, 9.5%, 5.985%, 0.53%, and 4.565% average improvements on ORL, Yale, USPS, YaleB, and COIL100 datasets, respectively, and simREFCMFS obtains 5.495%, 8.345%, 2.045%, 0.07% and 3.605% average improvements. Compared to KMedoids, REFCMFS achieves 16.89%, 10.475%, 21.03%, 4.36% and 18.14% average improvements on ORL, Yale, USPS, YaleB, and COIL100 datasets, respectively, and simREFCMFS obtains 15.065%, 9.32%, 17.09%, 3.87% and 13.35% average improvements.
In addition, REFCMFS outperforms two recent works LSC and RSFKM on all the datasets. Considered that LSC needs to select a few representative data points as the landmarks and represents the remaining data points as the linear combinations of these landmarks, how to select the representative information directly affects on this method. Therefore, compared to LSC, REFCMFS achieves 5.065%, 5.68%, 6.96%, 8%, 0.815%, and 3.605% average improvements on ORL, Yale, COIL20, USPS, YaleB, and COIL100 datasets, respectively. RSFKM introduces a penalized regularization on the membership matrix and controls the sparsity of membership matrix by the regularization parameter, which differs in that REFCMFS efficiently adjusts the sparsity of membership matrix through its norm constraint. Compared to RSFKM, REFCMFS achieves 4.565%, 8.42%, 2.615%, 3.875%, 0.735%, and 1.365% average improvements on ORL, Yale, COIL20, USPS, YaleB, and COIL100 datasets, respectively.
Furthermore, simREFCMFS is the simple version of REFCMFS, which achieves the second best performance on almost all datasets. Both simREFCMFS and REFCMFS prove that introducing the norm constraint with flexible sparsity imposed on the membership matrix can result in better performance than other comparison methods. Whereas, the loss function of simREFCMFS is based on the least square criteria, not a robust loss based on the norm, which may be sensitive to the outliers. Concretely, compared with simREFCMFS, REFCMFS achieves 1.825%, 1.155%, 1.095%, 3.94%, 0.49%, 0.96% average improvements on ORL, Yale, COIL20, USPS, YaleB, and COIL100 datasets, respectively.
Finally, Figure 5 shows the convergence curves and proves above convergence analysis of REFCMFS. What is more, combining the above computational complexity analysis in subsection I, we calculate the time complexities of different methods on all the datasets and report them in Table V. It is obvious REFCMFS is faster than RSFKM, GMM, and SC methods on all the datasets.
Vi Conclusion
In this paper, we have proposed a novel clustering algorithm, named REFCMFS, which develops a norm robust loss for the datadriven item and imposes a norm constraint on the membership matrix to make the model more robust and sparse flexibly. This not only avoids the incorrect or invalid clustering partitions from outliers but also greatly reduces the computational complexity. Concretely, REFCMFS designs a new way to simplify and solve the norm constraint directly without any approximate transformation by absorbing into the objective function through a ranking function. This make REFCMFS can be solved by a tractable and skillful optimization method and guarantees the optimality and convergence. Theoretical analyses and extensive experiments on several public datasets demonstrate the effectiveness and rationality of our proposed method.
References
 [1] (2015) Conditional spatial fuzzy cmeans clustering algorithm for segmentation of mri images. ASC 34, pp. 758–769. Cited by: §I.
 [2] (2016) Modified possibilistic fuzzy cmeans algorithms for segmentation of magnetic resonance image. ASC 41, pp. 104–119. Cited by: §I.
 [3] (2013) Similaritybased clustering by leftstochastic matrix factorization. JMLR 14 (1), pp. 1715–1746. Cited by: §I.
 [4] (2007) Kmeans++: the advantages of careful seeding. In ACMSIAM, Cited by: §VA2.
 [5] (2017) Generalized entropy based possibilistic fuzzy cmeans for clustering noisy data and its convergence proof. Neurocomputing 219, pp. 186–202. Cited by: §I.
 [6] (2015) Kmeans clustering is matrix factorization. arXiv preprint arXiv:1512.07548. Cited by: §I.
 [7] (1997) Eigenfaces vs. fisherfaces: recognition using class specific linear projection. Technical report Yale University New Haven United States. Cited by: §VA1.
 [8] (1980) A convergence theorem for teh fuzzy isodata clustering algorithms. TPAMI (1), pp. 1–8. Cited by: §IIB, §VA2.
 [9] (2016) Clustering and summarizing proteinprotein interaction networks: a survey. TKDE 28 (3), pp. 638–658. Cited by: §I.
 [10] (2006) Pattern recognition and machine learning. springer. Cited by: §IIC, §VA2.
 [11] (2004) Convex optimization. Cambridge university press. Cited by: §IVB.
 [12] (2011) Large scale spectral clustering with landmarkbased representation. In AAAI, Cited by: §VA2.
 [13] (1997) Spectral graph theory. American Mathematical Soc.. Cited by: §IID.
 [14] (2015) Time series clustering by a robust autoregressive metric with application to air pollution. CILS 141, pp. 107–124. Cited by: §I.
 [15] (2016) Kernelbased fuzzy cmeans clustering algorithm based on genetic algorithm. Neurocomputing 188, pp. 233–238. Cited by: §I.
 [16] (2010) Fast global kmeans clustering using cluster membership and inequality. Pattern Recognition 43 (5), pp. 1954–1963. Cited by: §I.
 [17] (2011) Fast modified global kmeans algorithm for incremental cluster construction. Pattern Recognition 44 (4), pp. 866–876. Cited by: §I.
 [18] (2017) Kmeans clustering with outlier removal. PRL 90, pp. 8–14. Cited by: §I.
 [19] (2001) From few to many: illumination cone models for face recognition under variable lighting and pose. TPAMI 23 (6), pp. 643–660. Cited by: §VA1.
 [20] Robust kmeans: a theoretical revisit. In Advances in Neural Information Processing Systems, Cited by: §I.
 [21] (2017) A systematic approach to clustering whole trajectories of mobile objects in road networks. TKDE 29 (5), pp. 936–949. Cited by: §I.
 [22] (2005) Improving kmeans by outlier removal. In SCIA, Cited by: §I.
 [23] (2003) Discovering clusterbased local outliers. PRL 24 (910), pp. 1641–1650. Cited by: §I.
 [24] (1994) A database for handwritten text recognition research. TPAMI 16 (5), pp. 550–554. Cited by: §VA1.
 [25] (2018) Ikmeans+: an iterative clustering algorithm based on an enhanced version of the kmeans. Pattern Recognition 79, pp. 402–413. Cited by: §I.
 [26] (2013) Clustering spatiotemporal data: an augmented fuzzy cmeans. TFS 21 (5), pp. 855–868. Cited by: §I.
 [27] (2016) Initialization of kmodes clustering using outlier detection techniques. IS 332, pp. 167–183. Cited by: §I.
 [28] (2001) Twophase clustering process for outliers detection. PRL 22 (67), pp. 691–700. Cited by: §I.
 [29] (2008) Clusteringbased outlier detection method. In FSKD, Vol. 2, pp. 429–433. Cited by: §I.
 [30] (2015) Robust dictionary learning wif capped norm. In IJCAI, Cited by: §IIE.
 [31] (2004) Robust clustering algorithm for suppression of outliers [data classification applications]. In International Symposium on Intelligent Multimedia, Video and Speech Processing, pp. 691–694. Cited by: §I.
 [32] (2018) Significantly fast and robust fuzzy cmeans clustering algorithm based on morphological reconstruction and membership filtering. TFS. Cited by: §I.
 [33] (2008) Modified global kmeans algorithm for minimum sumofsquares clustering problems. Pattern Recognition 41 (10), pp. 3192–3199. Cited by: §I.
 [34] (1996) Columbia object image library (coil 100). department of comp. Science, Columbia University, Tech. Rep. CUCS00696. Cited by: §VA1.
 [35] (1996) Columbia object image library (coil20). Cited by: §VA1.
 [36] (2010) Efficient and robust feature selection via joint norms minimization. In NIPS, Cited by: §IIE, §IVB.
 [37] (2013) A survey on clustering techniques in medical diagnosis. IJCST 1 (2), pp. 17–23. Cited by: §I.
 [38] (2014) On integrated clustering and outlier detection. In NIPS, Cited by: §I.
 [39] (2011) An outlier detection method based on clustering. In EAIT, Cited by: §I.
 [40] (2009) A simple and fast algorithm for kmedoids clustering. Expert systems with applications 36 (2), pp. 3336–3341. Cited by: §VA2.
 [41] (2016) Multivariate fuzzy cmeans algorithms with weighting. Neurocomputing 174, pp. 946–965. Cited by: §I.
 [42] (2015) A sparse fuzzy cmeans algorithm based on sparse clustering framework. Neurocomputing 157, pp. 290–295. Cited by: §I.
 [43] (2007) A novel approach to noise clustering for outlier detection. SC 11 (5), pp. 489–494. Cited by: §I.
 [44] (1994) Parameterisation of a stochastic model for human face identification. In ACV, pp. 138–142. Cited by: §VA1.
 [45] (2014) Generalization rules for the suppressed fuzzy cmeans clustering algorithm. Neurocomputing 139, pp. 298–309. Cited by: §I.
 [46] (2003) The global kmeans clustering algorithm. Pattern Recognition 36 (2), pp. 451–461. Cited by: §I.
 [47] (2014) The minmax kmeans clustering algorithm. Pattern Recognition 47 (7), pp. 2505–2516. Cited by: §I.
 [48] (2007) A tutorial on spectral clustering. SC 17 (4), pp. 395–416. Cited by: §VA2.
 [49] (2015) Optimized cartesian kmeans. TKDE 27 (1), pp. 180–192. Cited by: §I.
 [50] (2015) Group kmeans. arXiv preprint arXiv:1501.00825. Cited by: §I.
 [51] (2015) Nonexhaustive, overlapping kmeans. In ICDM, Cited by: §I.
 [52] (2015) Comparing the performance of biomedical clustering methods. NM 12 (11), pp. 1033. Cited by: §I.
 [53] (2018) Densitybased place clustering using geosocial network data. TKDE 30 (5), pp. 838–851. Cited by: §I.
 [54] (2015) Kmeansbased consensus clustering: a unified view. TKDE 27 (1), pp. 155–169. Cited by: §I.
 [55] (2016) Robust and sparse fuzzy kmeans clustering.. In IJCAI, Cited by: §I, §IIE, §VA2.
 [56] (2017) Robustlearning fuzzy cmeans clustering algorithm with unknown number of clusters. Pattern Recognition 71, pp. 45–59. Cited by: §I.
 [57] (2009) A new local distancebased outlier detection approach for scattered realworld data. In PAKDD, Cited by: §I.
 [58] (2009) A novel kmeans algorithm for clustering and outlier detection. In FITME, Cited by: §I.