Toward MultiDiversified Ensemble Clustering of HighDimensional Data
Abstract
The emergence of highdimensional data in various areas has brought new challenges to the ensemble clustering research. To deal with the curse of dimensionality, considerable efforts in ensemble clustering have been made by incorporating various subspacebased techniques. Besides the emphasis on subspaces, rather limited attention has been paid to the potential diversity in similarity/dissimilarity metrics. It remains a surprisingly open problem in ensemble clustering how to create and aggregate a large number of diversified metrics, and furthermore, how to jointly exploit the multilevel diversity in the large number of metrics, subspaces, and clusters, in a unified framework. To tackle this problem, this paper proposes a novel multidiversified ensemble clustering approach. In particular, we create a large number of diversified metrics by randomizing a scaled exponential similarity kernel, which are then coupled with random subspaces to form a large set of metricsubspace pairs. Based on the similarity matrices derived from these metricsubspace pairs, an ensemble of diversified base clusterings can thereby be constructed. Further, an entropybased criterion is adopted to explore the clusterwise diversity in ensembles, based on which the consensus function is therefore presented. Experimental results on twenty highdimensional datasets have confirmed the superiority of our approach over the stateoftheart.
1 Introduction
The last decade has witnessed significant progress in the development of the ensemble clustering technique [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14], which is typically featured by its ability of combining multiple base clusterings into a probably better and more robust consensus clustering, and has recently shown promising advantages in discovering clusters of arbitrary shapes, dealing with noisy data, coping with data from multiple sources, and producing robust clustering results [6].
In recent years, with highdimensional data widely appearing in various areas, new challenges have been brought to the conventional ensemble clustering algorithms, which, however, often lack the ability to well address the highdimensional issues. As is termed the curse of dimensionality, it is highly desired but very difficult to find the inherent cluster structure hidden in the huge dimensions, especially when it is frequently coupled with very low sample size. Recently some efforts have been devoted to ensemble clustering of highdimensional data, which typically exploit different subspacebased (or featurebased) techniques, such as random subspace sampling [15, 9, 16, 17], stratified subspace sampling [7], and subspace projection [18], to explore the diversity in highdimensionality. Inherently, these subspacebased techniques select or linearly combine data features into different subsets (i.e., subspaces) by a variety of strategies to seek more perspectives for finding cluster structures.
Besides the issue of subspaces (or features), the choice of similarity/dissimilarity metrics is another crucial factor in dealing with highdimensional data [19, 20]. The existing ensemble clustering methods typically adopt one or a few preselected metrics, which are often selected implicitly based on the expert’s knowledge or some prior assumptions. However, few, if not none, of them have considered the potentially huge benefits and issues hidden in randomized metric spaces. In one hand, it is very difficult to select or learn an optimal metric for a given dataset without human supervision or implicit assumptions. In another hand, with different metrics capable of reflecting different perspectives on data, the joint use of a large number of randomized/diversified metrics may reveal huge opportunities hidden in highdimensionality. However, it is surprisingly still an open problem in ensemble clustering how to produce and aggregate a large number of diversified metrics to enhance the consensus performance. Furthermore, starting from the metric diversification problem, another crucial challenge arises as to how to jointly exploit multiple levels of diversity in the large number of metrics, subspaces, and clusters, in a unified ensemble clustering framework.
To tackle the abovementioned problem, we propose a novel ensemble clustering approach termed multidiversified ensemble clustering (MDEC) by jointly reconciling large populations of diversified metrics, random subspaces, and weighted clusters. Specifically, we exploit a scaled exponential similarity kernel as the seed kernel, which has advantages in parameter flexibility and neighborhood adaptivity and is randomized to breed a large set of diversified metrics. The set of diversified metrics are coupled with random subspaces to form a large number of metricsubspace pairs, which then contribute to the jointly randomized ensemble generation process where the set of diversified base clusterings are produced with the help of the spectral clustering algorithm. With the clustering ensemble generated, to exploit the clusterwise diversity in the multiple base clusterings, an entropybased cluster validity strategy is adopted to evaluate and weight each base cluster by considering the distribution of clusters in the entire ensemble, based on which a new multidiversified consensus function is therefore proposed (see Section 3 and Fig. 1 for more details). In this paper, we conducted experiments on 20 highdimensional datasets, including 15 cancer gene expression datasets and 5 image datasets. Extensive experimental results have shown the superiority of our approach against the stateoftheart ensemble clustering approaches for clustering highdimensional data.
For clarity, the main contributions of this work are summarized as follows:

This paper for the first time, to the best of our knowledge, shows that the joint use of a large population of randomly diversified metrics can significantly benefit the ensemble clustering of highdimensional data in an unsupervised manner.

A new metric diversification strategy is proposed by randomizing the scaled exponential similarity kernel with both parameter flexibility and neighborhood adaptivity considered, which is further coupled with random subspace sampling for the jointly randomized generation of base clusterings.

A new ensemble clustering approach termed MDEC is presented, which has the ability of simultaneously exploiting a large population of diversified metrics, random subspaces, and weighted clusters in a unified framework.

Extensive experiments have been conducted on a variety of highdimensional datasets, which demonstrate the significant advantages of our approach over the stateoftheart ensemble clustering approaches.
2 Related Work
Due to its ability of combining multiple base clusterings into a probably better and more robust consensus clustering, the ensemble clustering technique has been receiving increasing attention in recent years. Many ensemble clustering algorithms have been developed from different technical perspectives [2, 6, 10, 8, 11, 12, 13, 14, 21, 1, 5, 3, 4, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33], which can be classified into three main categories, namely, the pairwise cooccurrence based methods, the graph partitioning based methods, and the median partition based methods.
The pairwise cooccurrence based methods [21, 28, 29] typically construct a coassociation matrix by considering the frequency that two data samples occur in the same cluster among the multiple base clusterings. The coassociation matrix is then used as the similarity matrix for the data samples, upon which some clustering algorithms can thereby be performed to obtain the final clustering result. Fred and Jain [21] first introduced the concept of the coassociation matrix and proposed the evidence accumulation clustering (EAC) method, which applied a hierarchical agglomerative clustering algorithm [34] on the coassociation matrix to build the consensus clustering. To extend the EAC method, Wang et al. [28] took the cluster sizes into consideration and proposed the probability accumulation method. Yi et al. [29] dealt with the uncertain entries in the coassociation matrix by first labeling them as unobserved, and then recovering the unobserved entries by the matrix completion technique. Liu et al. [13] proved that the spectral clustering of the coassociation matrix is equivalent to a weighted version of means, and proposed the spectral ensemble clustering (SEC) method to effectively and efficiently obtain the consensus result.
The graph partitioning based methods [8, 30, 31] generally construct a graph model for the ensemble of multiple base clusterings, and then partition the graph into several disjoint subsets to obtain the final clustering result. Strehl and Ghosh [30] solved the ensemble clustering problem by using three graph partitioning based algorithms, namely, clusterbased similarity partitioning algorithm (CSPA), hypergraph partitioning algorithm (HGPA), and metaclustering algorithm (MCLA). Fern and Brodley [31] formulated a bipartite graph model by treating both clusters and data samples as nodes, and partitioned the graph by the METIS algorithm [35] to obtain the consensus result. Huang et al. [8] dealt with the ensemble clustering problem by sparse graph representation and random walk trajectory analysis, and presented the probability trajectory based graph partitioning (PTGP) method.
The median partition based methods [10, 32, 33] typically formulate the ensemble clustering problem into an optimization problem which aims to find the median partition such that the similarity between the base partitions (i.e., base clusterings) and the median partition is maximized. The median partition problem is NPhard [32]. To find an approximate solution, Topchy et al. [32] cast the median partition problem into a maximum likelihood problem and solved it by the EM algorithm. Franek and Jiang [33] reduced the ensemble clustering problem to an Euclidean median problem and solved it by the Weiszfeld algorithm [36]. Huang et al. [10] formulated the ensemble clustering problem into a binary linear programming problem and obtained an approximate solution based on the factor graph model and the maxproduct belief propagation [37].
Although in recent years significant advances have been made in the research of ensemble clustering [2, 6, 10, 8, 11, 12, 13, 14, 21, 1, 3, 4, 5, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33], yet the existing methods are mostly devised for generalpurpose scenarios and lack the desirable ability to appropriately address the clustering problem of highdimensional data. More recently, some efforts have been made to deal with the curse of dimensionality, where subspacebased (or featurebased) techniques are often exploited. Jing et al. [7] adopted stratified feature sampling to generate a set of subspaces, which are further incorporated into several ensemble clustering algorithms to build the consensus clustering for highdimensional data. Yu et al. [9] proposed a novel subspacebased ensemble clustering framework termed APCE, which integrates random subspaces, affinity propagation, normalized cut, and five candidate distance metrics. Further, Yu et al. [16] proposed a semisupervised subspacebased ensemble clustering framework by incorporating random subspaces, constraint propagation, incremental ensemble selection, and normalized cut into the framework. Fern and Brodley [18] exploited random subspace projection to build a set of subspaces, which in fact are obtained by (randomly) linear combination of features (or feature sets). These methods [7, 9, 16, 18] typically exploit the diversity in highdimensionality by various subspacebased techniques, but few of them have fully considered the potentially huge diversity in metric spaces. The existing methods [7, 9, 16, 18] generally use one or a few preselected similarity/disimilarity metrics, which are selected implicitly based on the expert’s knowledge or some prior assumptions. Although the method in [9] proposed to randomly select a metric out of the five candidate metrics at each time, yet it still failed to go beyond a few metrics to explore the huge potential hidden in a large number of diversified metrics, which may play a crucial role in clustering highdimensional data. The key challenge here lies in how to create such a large number of highly diversified metrics, and further how to jointly exploit the diversity in the large number of metrics, together with subspacewise diversity and clusterwise diversity, to achieve a unified ensemble clustering framework for highdimensional data.
3 Proposed Framework
This section describes the overall algorithm of the proposed ensemble clustering approach. A brief overview is provided in Section 3.1. The metric diversification process is presented in Section 3.2. The jointly randomized ensemble generation is introduced in Section 3.3. Finally the consensus function is given in Section 3.4.
3.1 Brief Overview
In this paper, we propose a novel multidiversified ensemble clustering (MDEC) approach (see Fig. 1). First, we create a large number of diversified metrics by randomizing a scaled exponential similarity kernel, and combine the diversified metrics with the random subspaces to form a large set of random metricsubspace pairs. Second, with each random metricsubspace pair, we construct a similarity matrix for the data samples. The spectral clustering algorithm is then performed on these similarity matrices derived from metricsubspace pairs to obtain an ensemble of base clusterings. Third, to exploit the clusterwise diversity in the ensemble of multiple base clusterings, we adopt an entropy based criterion to evaluate and weight the clusters by considering the distribution of cluster labels in the entire ensemble. With the weighted clusters, the locally weighted coassociation matrix is further constructed to serve as a summary of the ensemble. Finally, the spectral clustering algorithm is performed on the locally weighted coassociation matrix to obtain the consensus clustering result. It is noteworthy that our approach simultaneously incorporates three levels of diversity, i.e., metricwise diversity, subspacewise diversity, and clusterwise diversity, in a unified framework, which have shown significant advantages in dealing with highdimensional data when compared to the stateoftheart ensemble clustering approaches. In the following sections, we will further introduce each step of the proposed approach in detail.
3.2 Diversification of Metrics
The choice of similarity/dissimilarity metrics plays a very crucial role in the field of machine learning and pattern recognition [38, 39, 40, 41, 42]. In particular, unlike the supervised or semisupervised learning, where the metric learning techniques can be performed to learn the metrics with the help of human supervision or prior assumptions [38, 39, 40, 41, 42], in unsupervised learning it is generally very difficult to choose a proper metric given a task without prior knowledge.
Instead of relying on one or a few manuallyselected (or learned) metric, this paper proposes to jointly use a large number of randomly diversified metrics in a unified ensemble clustering framework. Toward this end, we first need to tackle two subproblems here, i.e., how to create a large number of diversified metrics, and how to collectively exploit them in ensemble clustering.
To create the diversified metrics, we take advantage of the kernel trick with randomization incorporated. The kernel similarity metrics have been proved to be a powerful tool for clustering complex data [38, 42], which, however, suffer from the difficulties in selecting proper kernel parameters. The kernel parameters can be learned by some metric learning techniques [38, 42] with supervision or semisupervision. But without human supervision, it is often extremely difficult to decide proper kernel parameters. This is a critical disadvantage of kernel methods for conventional (unsupervised) applications, which, nevertheless, just becomes an important advantage in our situation where what is highly desired is not the selection of a good kernel similarity metric, but the flexibility to create a large number of diversified ones.
Specifically, in this paper, we adopt the scaled exponential similarity (SES) kernel [43] as the seed kernel, which will then be randomized to breed a large population of diversified metrics. Given a set of data samples , where is the th sample and is the number of features. The SES kernel function for samples and is defined as:
(1) 
where is a hyperparameter, is a scaling term, and is the Euclidean distance between and . Let denote the set of the nearest neighbors of . The average distance between and its nearest neighbors can be computed as
(2) 
Then, as suggested in [43], to simultaneously take into consideration the neighborhood of , the neighborhood of , and their distance, the scaling term is defined as the average of , , and . That is
(3) 
The SES kernel is a variant of the Gaussian kernel. It has two free parameters, i.e., the hyperparameter and the number of nearest neighbors . The motivation to adopt the SES kernel as the seed kernel in our approach is twofold. First, with the influence of the scaling term where the nearestneighbors’ information is incorporated, the SES kernel has the adaptivity to the neighborhood structure among data samples. Moreover, with each value corresponding to a specific neighborhood size, by randomizing the parameter , multiscale neighborhood information can be explored to enhance the diversity. Second, the two free parameters and in the SES kernel provide high flexibility for adjusting the influence of the kernel and can contribute to the high diversity of the generated metrics by randomly perturbing the two parameters.
Specifically, we propose to randomly select the two parameters and , respectively, as follows:
(4)  
(5) 
where and are two uniform random variables, and outputs the floor of a real number.
Note that our objective is not to find a good pair of parameters and , but to randomize them to yield a large population of diversified metrics. The parameters and are suggested to be randomly selected in a wide range to enhance the diversity. By performing the random selection times, a set of pairs of and are obtained, which correspond to randomized kernel similarity metrics for the dataset , denoted as
(6) 
where and are the th pair of randomized parameters.
3.3 Ensemble Generation by Joint Randomization
In this section, with the set of diversified metrics generated, we proceed to couple diversified metrics with random subspaces for jointly randomized ensemble generation.
Let be the set of features in the dataset , where denotes the th feature. A random subspace is a set of a certain number of features that are randomly sampled from the original feature set. The cluster structure of highdimensional data may be hidden in differen feature subspaces as well as in different metric spaces. In this paper, we propose to jointly exploit large populations of diversified metrics and random subspaces. Specifically, we perform random subspace sampling times to obtain random subspaces, denoted as , which lead to component datasets, denoted as . Note that each component dataset has the same number of data samples as the original dataset , but its feature set only consists of attributes that are randomly sampled from with a sampling ratio . Obviously, if , then it means every subspace is in fact the original feature space, i.e., no subsampling actually happens. Here, with the random subspaces generated, we can couple each of them with a randomly diversified metric (as describe in Section 3.2), and thus obtain random metricsubspace pairs, denoted as
(7) 
In terms of the th metricsubspace pair , the similarity between samples and is computed by first mapping and onto the subspace associated with the component dataset and then computing their SES kernel similarity with the randomly selected parameters and . Thus, we can obtain similarity matrices in terms of the metricsubspace pairs as follows:
(8) 
where the th similarity matrix (i.e., ) is constructed in terms of the th metricsubspace pair , denoted as
(9) 
where
(10) 
is the th entry in . Obviously, according to the definition of the SES kernel, it holds that for any . If samples and have the same feature values in the subspace associated with , then their similarity reaches its maximum .
Having constructed similarity matrices with diversified metricsubspace pairs, we then exploit the spectral clustering algorithm [44] to construct the ensemble of base clusterings. Spectral clustering is a widely used graph partitioning algorithm, which is capable of capturing the global structure in a graph [44].
Specifically, for the th similarity matrix , we treat each data sample as a graph node and build a similarity graph as follows:
(11) 
where is the node set, and is the edge set. The edge weights are decided by the similarity matrix , i.e., for any , we have . Let denote the number of clusters in the base clustering. The objective of spectral clustering is to partition the graph into disjoint subsets. To this end, we construct the normalized graph Laplacian as follows:
(12) 
where the degree matrix is a diagonal matrix with its th entry defined as the sum of the th row of . The eigenvectors corresponding to the first eigenvalues of are computed and then stacked to form a new matrix , where the th column of is the eigenvector corresponding to the th eigenvalue of . Thereafter, the matrix can be obtained from by normalizing the rows to norm 1.
By treating each row of as a data point in , we can cluster the rows into clusters by means and thereby obtain the th base clustering based on the similarity matrix . Formally, the th base clustering is denoted as
(13) 
where is the th cluster in . It is obvious that the clusters in a base clustering cover the entire dataset, i.e., , and two clusters in the same base clustering will not overlap with each other, i.e., .
Finally, based on the diversified similarity matrices in , we can construct an ensemble of base clusterings, denoted as
(14) 
where is the th base clustering in the ensemble .
3.4 Consensus Function
With the ensemble generated, the objective of the consensus function is to combine the set of base clusterings into a probably better and more robust final clustering.
As each base clustering consists of a certain number of clusters, the entire ensemble can also be viewed as a large set of clusters from different base clusterings. To exploit the different reliability of different clusters and incorporate the clusterwise diversity in the consensus function, here we adopt a local weighting strategy [11] to evaluate and weight the base clusters by jointly considering the distribution of cluster labels in the entire ensemble using an entropic criterion. Formally, we denote the ensemble of clusters as
(15) 
where is the th cluster and is the total number of clusters in the ensemble . Note that .
Each cluster is a set of data samples. To estimate the uncertainty of different clusters, the concept of entropy is utilized here [11]. Given a cluster and a base clustering , the uncertainty (or entropy) of w.r.t. can be computed as
(16) 
where
(17) 
is the proportion of data samples in that also appear in . It is obvious that , which leads to . If and only if all data samples in also occur in the same cluster in , the uncertainty of w.r.t. reaches its minimum 0.
With the uncertainty of a cluster w.r.t. a base clustering given in Eq. (16) and the general assumption that the set of base clusterings are independent of each other, we can obtain the uncertainty (or entropy) of w.r.t. the entire ensemble as follows
(18) 
Intuitively, higher uncertainty indicates lower reliability for a cluster, which implies that the ensemble of base clusterings tend to disagree with the cluster and accordingly a smaller weight can be associated with it [11]. In particular, we proceed to compute a reliability index from the abovementioned uncertainty measure, and exploit it as a cluster weighting term in our consensus function. The experimental analysis about the efficacy of the cluster weighting term will also be provided in Section 4.6. Specifically, the ensembledriven cluster index (ECI) is computed as an indication for the reliability of each cluster in the ensemble, which is defined as follows:
(19) 
Obviously, for any , it holds that , then we have and thereby . Note that a larger value of ECI is associated with a cluster of lower uncertainty (i.e., higher reliability). If and only if the data samples in appear in the same cluster in all of the base clusterings (i.e., all base clusterings agree that the data samples in should belong to the same cluster), the uncertainty of w.r.t. reaches its minimum 0 and the ECI of reaches its maximum 1.
The ECI measure serves as a reliability index for different clusters in the ensemble . By using ECI as a clusterweighting term, the locally weighted coassociation matrix can be obtained as follow:
(20) 
with
(21)  
(22)  
(23) 
where denotes the cluster in that belongs to. Note that is the cluster weighting term which weights each cluster according to its ECI value, while is the pairwise cooccurrence term that indicates whether two samples occur in the same cluster in a base clustering .
Then, with the data samples treated as graph nodes and the locally weighted coassociation matrix used as the similarity matrix, the similarity graph for the consensus function can be constructed as follows:
(24) 
where is the node set, and is the edge set with the weight for any samples and . Thereafter, graph is partitioned into disjoint subsets by performing the spectral clustering algorithm [44]. By treating each subset of graph nodes as a final cluster, the consensus clustering result can thus be obtained.
For clarity, the overall algorithm of the proposed ensemble clustering approach is summarized in Algorithm 1.
4 Experiments
In this section, we conduct experiments on a variety of realworld highdimensional datasets to compare the proposed MDEC approach against several stateoftheart ensemble clustering approaches.
4.1 Datasets and Experimental Setting
We use twenty highdimensional datasets in the experiments, including fifteen cancer gene expression datasets and five image datasets (see Tables I and II). The fifteen cancer gene expression datasets are from [45], while the five image datasets (i.e., UMist, Multiple Features, Flowers17, COIL20, and Binary Alphadigits) are from [46], [47], [48], [49], and [50], respectively. For clarity, in the following, the fifteen cancer gene expression datasets will be respectively abbreviated as GD1 to GD15, while the five image datasets will be respectively abbreviated as ID1 to ID5 (as shown in Tables I and II).
Dataset  Abbr.  #Sample  Dimension  #Class 

Bhattacharjee2001  GD1  203  1,543  5 
Chowdary2006  GD2  104  182  2 
Dyrskjot2003  GD3  40  1,203  3 
Golub1999v1  GD4  72  1,877  2 
Golub1999v2  GD5  72  1,877  3 
Gordon2002  GD6  181  1,626  2 
Nutt2003v1  GD7  50  1,377  4 
Pomeroy2002v2  GD8  42  1,379  5 
Su2001  GD9  174  1,571  10 
Tomlins2006v2  GD10  92  1,288  4 
Alizadeh2000v2  GD11  62  2,093  3 
Armstrong2002v2  GD12  72  2,194  3 
Lapointe2004v1  GD13  69  1,625  3 
Lapointe2004v2  GD14  110  2,496  4 
Ramaswamy2001  GD15  190  1,363  14 
Dataset  Abbr.  #Sample  Dimension  #Class 

UMist  ID1  575  10,304  20 
Multiple Features  ID2  2,000  649  10 
Flowers17  ID3  1,360  30,000  17 
COIL20  ID4  1,440  1,024  20 
Binary Alphadigits  ID5  1,404  320  36 
To produce a large set of diversified metrics, the two kernel parameters in the SES kernel are suggested to be randomized in a wide range. Specifically, in the experiments, the two kernel parameters and are randomly selected in the ranges of and , respectively. To generate the ensemble of base clusterings, the ensemble size and the sampling ratio are used. The number of clusters in each base clustering is randomly selected in the range of . In Sections 4.5 and Section 4.6, we will further evaluate the ensemble clustering performance of our approach with different ensemble sizes and different sampling ratios .
4.2 Evaluation Measures
To evaluate the quality of the clustering result, two widely used evaluation measures are adopted, namely, normalized mutual information (NMI) [30] and adjusted Rand index (ARI) [51]. Note that greater values of NMI and ARI indicate better clusterings.
The NMI serves as a sound indication of the shared information between two clusterings. Given the test clustering and the groundtruth clustering , the NMI between and is defined as follows [30]:
(25) 
where and denote the number of clusters in and , respectively, denotes the number of samples in the th cluster of , denotes the number of samples in the th cluster of , and denotes the number of common samples between cluster in and cluster in .
Dataset  SSCSPA  SSHGPA  SSMCLA  KCC  PTA  PTGP  LWEA  LWGP  SEC  ECC  MDEC 
GD1  0.206  0.191  0.267  0.269  0.389  0.399  0.388  0.421  0.276  0.359  0.496 
GD2  0.603  0.736  0.480  0.601  0.249  0.081  0.081  0.081  0.503  0.612  0.843 
GD3  0.490  0.437  0.560  0.456  0.403  0.484  0.475  0.426  0.469  0.528  0.638 
GD4  0.424  0.469  0.625  0.556  0.245  0.571  0.515  0.564  0.494  0.535  0.737 
GD5  0.483  0.517  0.423  0.537  0.520  0.673  0.613  0.660  0.593  0.675  0.739 
GD6  0.182  0.179  0.176  0.127  0.375  0.117  0.094  0.125  0.293  0.117  0.451 
GD7  0.406  0.372  0.367  0.463  0.469  0.484  0.449  0.448  0.465  0.432  0.489 
GD8  0.503  0.563  0.408  0.576  0.587  0.587  0.599  0.555  0.529  0.566  0.693 
GD9  0.520  0.527  0.447  0.572  0.580  0.570  0.546  0.531  0.569  0.569  0.677 
GD10  0.238  0.258  0.253  0.223  0.246  0.257  0.253  0.276  0.256  0.255  0.323 
GD11  0.571  0.489  0.567  0.654  0.973  0.976  0.625  0.719  0.595  0.703  0.913 
GD12  0.668  0.491  0.667  0.503  0.607  0.750  0.654  0.580  0.418  0.721  0.742 
GD13  0.135  0.113  0.150  0.092  0.084  0.160  0.186  0.208  0.061  0.121  0.247 
GD14  0.128  0.084  0.123  0.146  0.105  0.076  0.143  0.134  0.127  0.075  0.209 
GD15  0.504  0.542  0.187  0.451  0.425  0.384  0.292  0.292  0.380  0.436  0.639 
ID1  0.579  0.655  0.587  0.609  0.630  0.627  0.637  0.625  0.605  0.614  0.697 
ID2  0.813  0.699  0.863  0.731  0.812  0.835  0.846  0.870  0.641  0.752  0.895 
ID3  0.218  0.187  0.237  0.242  0.251  0.250  0.227  0.216  0.242  0.242  0.261 
ID4  0.784  0.768  0.754  0.723  0.664  0.670  0.756  0.802  0.721  0.743  0.828 
ID5  0.553  0.470  0.544  0.542  0.559  0.551  0.521  0.517  0.564  0.548  0.584 
Average score  0.450  0.437  0.434  0.454  0.459  0.475  0.445  0.453  0.440  0.480  0.605 
Average rank  7.05  7.60  7.40  6.55  6.05  4.85  6.20  6.10  7.00  6.05  1.15 
Dataset  SSCSPA  SSHGPA  SSMCLA  KCC  PTA  PTGP  LWEA  LWGP  SEC  ECC  MDEC 
GD1  0.083  0.070  0.123  0.137  0.228  0.240  0.223  0.308  0.123  0.221  0.441 
GD2  0.652  0.810  0.525  0.694  0.253  0.066  0.066  0.066  0.587  0.711  0.912 
GD3  0.432  0.353  0.610  0.501  0.314  0.528  0.524  0.479  0.495  0.557  0.645 
GD4  0.447  0.452  0.738  0.640  0.199  0.655  0.596  0.650  0.574  0.617  0.836 
GD5  0.461  0.496  0.398  0.519  0.460  0.683  0.632  0.651  0.600  0.726  0.790 
GD6  0.106  0.081  0.067  0.055  0.493  0.064  0.101  0.055  0.401  0.064  0.601 
GD7  0.245  0.232  0.205  0.342  0.333  0.364  0.356  0.356  0.350  0.262  0.395 
GD8  0.361  0.450  0.246  0.474  0.466  0.473  0.495  0.456  0.422  0.462  0.627 
GD9  0.362  0.354  0.283  0.426  0.402  0.414  0.385  0.386  0.420  0.401  0.529 
GD10  0.173  0.204  0.170  0.145  0.141  0.137  0.153  0.172  0.162  0.177  0.208 
GD11  0.430  0.393  0.434  0.554  0.964  0.977  0.578  0.769  0.480  0.619  0.947 
GD12  0.685  0.487  0.651  0.460  0.583  0.768  0.632  0.565  0.348  0.725  0.769 
GD13  0.116  0.073  0.144  0.068  0.045  0.134  0.159  0.180  0.061  0.111  0.218 
GD14  0.110  0.036  0.102  0.137  0.076  0.048  0.109  0.089  0.114  0.055  0.174 
GD15  0.290  0.360  0.074  0.177  0.127  0.062  0.032  0.031  0.105  0.169  0.490 
ID1  0.280  0.358  0.276  0.308  0.315  0.312  0.341  0.335  0.299  0.312  0.392 
ID2  0.789  0.616  0.848  0.627  0.757  0.796  0.818  0.857  0.485  0.652  0.890 
ID3  0.080  0.069  0.090  0.094  0.093  0.093  0.098  0.097  0.092  0.097  0.095 
ID4  0.642  0.591  0.565  0.525  0.371  0.378  0.558  0.643  0.519  0.544  0.693 
ID5  0.251  0.156  0.248  0.250  0.277  0.270  0.259  0.249  0.272  0.255  0.289 
Average score  0.350  0.332  0.340  0.357  0.345  0.373  0.356  0.370  0.345  0.387  0.547 
Average rank  7.10  7.75  7.30  6.45  6.90  5.50  5.35  5.70  7.10  5.60  1.25 
The ARI is computed by considering the number of pairs of samples on which two clusterings agree or disagree. Given two clusterings and , the ARI between them is defined as follows [51]:
(26) 
where is the number of sample pairs that occur in the same cluster in both and , is the number of sample pairs that occur in different clusters in both and , is the number of sample pairs that occur in the same cluster in but in different clusters in , and is the number of sample pairs that occur in different clusters in but in the same cluster in .
4.3 Comparison Against Base Clusterings
In ensemble clustering, it is generally expected that in the ensemble generation phase the base clusterings can be produced with high diversity, while in the consensus phase the consensus clustering can be constructed with improved stability and quality by fusing the base clusterings.
In this section, we evaluate the performances of the generated base clusterings and the final consensus clusterings of the proposed MDEC approach. As illustrated in Fig. 2, in one aspect, the ensemble of base clusterings show high diversity (typically, with high standard deviations w.r.t. both NMI and ARI) for the benchmark datasets. In another aspect, the consensus clustering results consistently outperform the base clusterings in terms of both overall stability and quality (see Fig. 2). Especially, for the GD2, GD4, and GD6 datasets, the average NMI and ARI scores (over 100 runs) of the consensus clusterings of our approach are even over twice as high as that of the base clusterings.
4.4 Comparison Against Other Ensemble Clustering Methods
In this section, we compare the proposed MDEC approach with ten stateoftheart ensemble clustering approaches, namely, stratified sampling based clusterbased similarity partitioning algorithm (SSCSPA) [7], stratified sampling based hypergraph partitioning algorithm (SSHGPA) [7], stratified sampling based metaclustering algorithm (SSMCLA) [7], means based consensus clustering (KCC) [6], probability trajectory accumulation (PTA) [8], probability trajectory based graph partitioning (PTGP) [8], locally weighted evidence accumulation (LWEA) [11], locally weighted graph partitioning (LWGP) [11], spectral ensemble clustering (SEC) [13], and entropy based consensus clustering (ECC) [14]. To compare the performances of different ensemble clustering approaches, we use the number of classes as the cluster number for each test approach, which is a commonly adopted experimental protocol in ensemble clustering [13, 14]. For each benchmark dataset, we run every test approach 100 times and report their average performances and standard deviations in Tables III and IV.
As shown in Table III, in terms of NMI, the proposed MDEC approach exhibits the best performance in eighteen out of the totally twenty datasets. Although the PTGP approach outperforms our approach in the GD11 and GD12 datasets, yet in all of the other eighteen datasets our approach shows significant advantages in the consensus performance (w.r.t. NMI) over the baseline approaches. Similarly, as shown in Table IV, in terms of ARI, the proposed approach also achieves the best performance in eighteen out of the totally twenty benchmark datasets, and shows a clear advantage over the baseline approaches.
To provide a summary view across the twenty benchmark datasets, we further show the average score and average rank of different approaches in the last two rows in Tables III and IV, respectively. Note that the average score (across twenty datasets) is computed by taking the average on the NMI (or ARI) scores, while the average rank is obtained by taking the average on the ranking positions, for each approach across all the datasets.
As can be seen in Table III, the proposed approach achieves an average NMI score of across twenty datasets, which is significantly higher than the second best approach (i.e., ECC) whose average NMI score is . In terms of the ranking positions in Table III, the proposed approach obtains an average rank of , while the second best approach (i.e., PTGP) only obtains an average rank of . Similar advantages can also be seen in Table IV. The average ARI score and the average rank of the proposed approach are and , respectively, which significantly outperform the ten baseline ensemble clustering approaches (see Table IV).
4.5 Robustness to Ensemble Sizes
In this section, we evaluate the performances of different ensemble clustering approaches with varying ensemble sizes. Specifically, we perform the proposed MDEC approach as well as the baseline approaches on the benchmark datasets with the ensemble size varying from to , and report their average performances over 20 runs in Figures 3 and 4.
As can be seen in Fig. 3, in terms of NMI, the proposed approach yields stably high performance across the twenty benchmark datasets with varying ensemble sizes. Although the PTGP and PTA approaches outperform our approach in the GD11 dataset, yet in most of the other datasets our approach achieves the best or nearly the best performance when compared to the baseline approaches. Especially, on the GD1, GD2, GD3, GD4, GD6, GD8, GD9, GD10, GD13, GD14, GD15, ID1, ID2, ID3, ID4, and ID5 datasets, our approach shows a significant advantage over the baseline approaches with varying ensemble sizes . Similarly, in terms of ARI, our approach also exhibits the best or nearly best performance on most of the benchmark datasets with varying ensemble sizes (as shown in Fig. 4).
Dataset  SSCSPA  SSHGPA  SSMCLA  KCC  PTA  PTGP  LWEA  LWGP  SEC  ECC  MDEC 

GD1  1.11  1.80  1.57  0.86  0.86  1.32  1.18  1.63  0.75  1.01  1.02 
GD2  0.73  0.92  0.96  0.22  0.28  0.49  0.54  0.74  0.18  0.25  0.54 
GD3  0.76  0.97  0.99  0.21  0.25  0.34  0.48  0.57  0.19  0.25  0.46 
GD4  0.86  1.08  1.11  0.35  0.41  0.57  0.66  0.82  0.32  0.37  0.57 
GD5  0.85  1.19  1.11  0.35  0.38  0.54  0.62  0.76  0.31  0.40  0.58 
GD6  1.07  1.34  1.48  0.72  0.78  1.17  1.10  1.46  0.65  0.80  0.94 
GD7  0.79  1.05  1.04  0.25  0.28  0.41  0.51  0.63  0.21  0.29  0.48 
GD8  0.79  1.00  0.97  0.25  0.27  0.38  0.48  0.57  0.21  0.29  0.44 
GD9  1.08  1.81  1.47  0.70  0.64  0.98  0.93  1.25  0.55  0.88  0.87 
GD10  0.87  1.26  1.09  0.35  0.37  0.57  0.62  0.79  0.28  0.40  0.57 
GD11  0.86  1.20  1.07  0.29  0.32  0.46  0.55  0.66  0.26  0.33  0.51 
GD12  0.92  1.25  1.17  0.37  0.41  0.57  0.65  0.80  0.33  0.41  0.60 
GD13  0.87  1.18  1.08  0.29  0.32  0.47  0.54  0.68  0.24  0.32  0.53 
GD14  1.04  1.39  1.33  0.54  0.57  0.81  0.82  1.04  0.48  0.62  0.76 
GD15  1.02  1.68  1.30  0.85  0.75  2.05  1.21  2.50  0.65  1.12  0.86 
ID1  8.71  10.21  9.06  18.27  17.90  19.28  18.29  19.64  17.77  19.29  8.66 
ID2  8.24  8.31  6.53  9.73  9.66  17.01  8.49  15.68  6.97  13.06  32.70 
ID3  73.97  77.29  73.78  206.94  206.78  211.39  206.00  210.43  205.08  209.67  62.80 
ID4  5.92  6.82  5.20  7.50  6.10  10.32  6.76  11.40  5.70  9.66  17.92 
ID5  5.04  82.01  5.62  5.07  4.46  9.17  3.65  8.07  2.89  8.26  17.96 
To provide a summary view, Fig. 5 further illustrates the average NMI and ARI scores (across twenty datasets) by different approaches with varying ensemble sizes . In fact, Fig. 5(a) is obtained by taking the average of the twenty subfigures in Fig. 3, ranging from Fig. 3(a) to Fig. 3(t), while Fig. 5(b) is obtained by taking the average of the twenty subfigures in Fig. 4, ranging from Fig. 4(a) to Fig. 4(t). As can be seen in Fig. 5, the proposed MDEC approach achieves significantly better performance (w.r.t. both NMI and ARI) than the baseline ensemble clustering approaches across the twenty benchmark datasets. Even when compared to the second and the third best approaches (i.e., ECC and PTGP, respectively), a clear advantage of the proposed MDEC approach can still be observed (see Fig. 5).
4.6 Influence of Metrics, Subspaces, and Clusters
This paper proposes to jointly exploit large populations of diversified metrics, random subspaces, and weighted clusters in a unified ensemble clustering framework. In this section, we evaluate the influence of the three factors (i.e., diversified metrics, random subspaces, and weighted clusters) in our approach.
First, we compare the diversified metrics with several widely used similarity metrics, i.e., cosine similarity, correlation coefficient, and Spearman correlation coefficient. Besides the proposed MDEC approach, we generate three subapproaches by replacing the diversified metrics by one of the three conventional similarity metrics. As can been seen in Fig. 6, in terms of NMI, the proposed approach with diversified metrics obtains an average score , whereas the three subapproaches (with the three conventional similarity metrics) obtain average scores of , , and , respectively. In terms of ARI, the proposed approach with diversified metrics obtains an average score of , which also significantly outperforms the three subapproaches whose average ARI scores are , , and , respectively. As shown in Fig. 6, the use of diversified metrics in the proposed approach is able to significantly improve the consensus clustering performance.
Second, we evaluate the performance of MDEC with different subspace sampling ratio , which varies from to (see Fig. 7). As illustrated in Fig. 7, moderate values of generally lead to better consensus clustering performance. When the sampling ratio goes from to , the performance declines, which suggests that the use of random subspaces exhibits a positive influence when compared to using the full feature sets (by setting ). At the other extreme, when setting to very small values, e.g., in the range of [0.1, 0.3], the performance also declines, due to the fact that the subspaces generated by a very small sampling ratio may not well represent the underlying distribution of the dataset. Empirically, it is suggested that the sampling ratio be set in the range of , which strikes a balance between diversity and quality.
Third, we evaluate the performance of our approach with and without the weighted clusters. Note that the performance of our approach without weighted clusters is obtained by setting all cluster weights equal to one. As shown in Fig. 8, in terms of both NMI and ARI, the proposed approach with weighted clusters exhibits consistently better average performance (across twenty datasets) than that without weighted clusters.
As shown in Figures 6 to 8, we have two main observations: 1) the performance of our approach benefits from the use of diversified metrics, random subspaces, and weighted clusters; 2) out of the three beneficial factors, the diversified metrics play the most important role in the consensus clustering performance, with consideration to the approximately of improvement (w.r.t. both NMI and ARI) that they lead to.
4.7 Execution Time
In this section, we evaluate the efficiency of different ensemble clustering approaches and report their execution times on the benchmark datasets in Table V. In general, larger dimensions and larger sample sizes lead to greater computational costs for the ensemble clustering approaches. As can be seen in Table V, the proposed MDEC approach consumes less than 1 second of time on fourteen out of the totally fifteen cancer gene expression datasets. On the five image datasets, the time efficiency of the proposed MDEC approach is also comparable to the other ensemble clustering approaches.
In summary, as can be seen in Tables III to V and Figures 3 to 5, the proposed MDEC approach has shown significant advantages in clustering accuracy while exhibiting competitive time efficiency when compared against the stateoftheart ensemble clustering approaches.
All of the experiments are conducted in MATLAB R2016a 64bit on a workstation (Windows 10 Enterprise 64bit, 12 Intel 2.40 GHz processors, 128 GB of RAM).
5 Conclusion
In this paper, we propose a new ensemble clustering approach termed MDEC, which is capable of jointly exploiting large populations of diversified metrics, random subspaces, and weighted clusters in a unified ensemble clustering framework. Specifically, a large number of diversified metrics are generated by randomizing a scaled exponential similarity kernel. The diversified metrics are then coupled with the random subspaces to form a large set of metricsubspace pairs. Upon the similarity matrices derived from the metricsubspace pairs, the spectral clustering algorithm is performed to construct an ensemble of diversified base clusterings. With the base clusterings generated, an entropybased cluster validity strategy is utilized to evaluate and weight the clusters with consideration to the distribution of the cluster labels in the entire ensemble. Based on the weighted clusters, the locally weighted coassociation matrix is built and then partitioned to obtain the consensus clustering. We have conducted extensive experiments on 20 highdimensional datasets (including 15 cancer gene expression datasets and 5 image datasets), which demonstrate the clear advantages of our approach over the stateoftheart ensemble clustering approaches.
Acknowledgments
This project was supported by NSFC (61602189, 61502543 & 61573387), National Key Research and Development Program of China (2016YFB1001003), Guangdong Natural Science Funds for Distinguished Young Scholar (2016A030306014), and Singapore Ministry of Education Tier2 Grant (MOE2014T22023).
References
 [1] T. Li and C. Ding, “Weighted consensus clustering,” in Proc. of SIAM International Conference on Data Mining (SDM), 2008, pp. 798–809.
 [2] N. IamOn, T. Boongoen, S. Garrett, and C. Price, “A linkbased approach to the cluster ensemble problem,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 12, pp. 2396–2409, 2011.
 [3] T. Wang, “CATree: A hierarchical structure for efficient and scalable coassociationbased cluster ensembles,” IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 41, no. 3, pp. 686–698, 2011.
 [4] N. Li and L. J. Latecki, “Clustering aggregation as maximumweight independent set,” in Advances in Neural Information Processing Systems (NIPS), 2012, pp. 782–790.
 [5] L. Zheng, T. Li, and C. Ding, “A framework for hierarchical ensemble clustering,” ACM Transactions on Knowledge Discovery from Data, vol. 9, no. 2, pp. 9:1–9:23, 2014.
 [6] J. Wu, H. Liu, H. Xiong, J. Cao, and J. Chen, “Kmeansbased consensus clustering: A unified view,” IEEE Transactions on Knowledge and Data Engineering, vol. 27, no. 1, pp. 155–169, 2015.
 [7] L. Jing, K. Tian, and J. Z. Huang, “Stratified feature sampling method for ensemble clustering of high dimensional data,” Pattern Recognition, vol. 48, no. 11, pp. 3688–3702, 2015.
 [8] D. Huang, J.H. Lai, and C.D. Wang, “Robust ensemble clustering using probability trajectories,” IEEE Transactions on Knowledge and Data Engineering, vol. 28, no. 5, pp. 1312–1326, 2016.
 [9] Z. Yu, L. Li, J. Liu, J. Zhang, and G. Han, “Adaptive noise immune cluster ensemble using affinity propagation,” IEEE Transactions on Knowledge and Data Engineering, vol. 27, no. 12, pp. 3176–3189, 2015.
 [10] D. Huang, J. Lai, and C.D. Wang, “Ensemble clustering using factor graph,” Pattern Recognition, vol. 50, pp. 131–142, 2016.
 [11] D. Huang, C. D. Wang, and J. H. Lai, “Locally weighted ensemble clustering,” IEEE Transactions on Cybernetics, in press, 2017.
 [12] H. Liu, M. Shao, S. Li, and Y. Fu, “Infinite ensemble clustering,” Data Mining and Knowledge Discovery, in press, 2017.
 [13] H. Liu, J. Wu, T. Liu, D. Tao, and Y. Fu, “Spectral ensemble clustering via weighted kmeans: Theoretical and practical evidence,” IEEE Transactions on Knowledge and Data Engineering, vol. 29, no. 5, pp. 1129–1143, 2017.
 [14] H. Liu, R. Zhao, H. Fang, F. Cheng, Y. Fu, and Y.Y. Liu, “Entropybased consensus clustering for patient stratification,” Bioinformatics, vol. 33, no. 17, pp. 2691–2698, 2017.
 [15] Z. Yu, H.S. Wong, and H. Wang, “Graphbased consensus clustering for class discovery from gene expression data,” Bioinformatics, vol. 23, no. 21, pp. 2888–2896, 2007.
 [16] Z. Yu, P. Luo, J. You, H. S. Wong, H. Leung, S. Wu, J. Zhang, and G. Han, “Incremental semisupervised clustering ensemble for high dimensional data clustering,” IEEE Transactions on Knowledge and Data Engineering, vol. 28, no. 3, pp. 701–714, 2016.
 [17] Z. Yu, Z. Kuang, J. Liu, H. Chen, J. Zhang, J. You, H. S. Wong, and G. Han, “Adaptive ensembling of semisupervised clustering solutions,” IEEE Transactions on Knowledge and Data Engineering, vol. 29, no. 8, pp. 1577–1590, 2017.
 [18] X. Z. Fern and C. E. Brodley, “Random projection for high dimensional data clustering: A cluster ensemble approach,” in Proc. of International Conference on Machine Learning (ICML), 2003, pp. 186–193.
 [19] J. H. Lee, K. T. McDonnell, A. Zelenyuk, D. Imre, and K. Mueller, “A structurebased distance metric for highdimensional space exploration with multidimensional scaling,” IEEE Transactions on Visualization and Computer Graphics, vol. 20, no. 3, pp. 351–364, 2014.
 [20] C. M. Hsu and M. S. Chen, “On the design and applicability of distance functions in highdimensional data space,” IEEE Transactions on Knowledge and Data Engineering, vol. 21, no. 4, pp. 523–536, 2009.
 [21] A. L. N. Fred and A. K. Jain, “Combining multiple clusterings using evidence accumulation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 6, pp. 835–850, 2005.
 [22] J. Wu, H. Liu, H. Xiong, and J. Cao, “A theoretic framework of kmeansbased consensus clustering,” in Proc. of International Joint Conference on Artificial Intelligence, 2013.
 [23] C. Zhong, X. Yue, Z. Zhang, and J. Lei, “A clustering ensemble: Twolevelrefined coassociation matrix with pathbased transformation,” Pattern Recognition, vol. 48, no. 8, pp. 2699–2709, 2015.
 [24] D. Huang, J.H. Lai, and C.D. Wang, “Combining multiple clusterings via crowd agreement estimation and multigranularity link analysis,” Neurocomputing, vol. 170, pp. 240–250, 2015.
 [25] Y. Fan, N. Li, C. Li, Z. Ma, L. J. Latecki, and K. Su, “Restart and random walk in local search for maximum vertex weight cliques with evaluations in clustering aggregation,” in Proc. of International Joint Conference on Artificial Intelligence (IJCAI), 2017, pp. 622–630.
 [26] M. Yousefnezhad, S. J. Huang, and D. Zhang, “WoCE: A framework for clustering ensemble by exploiting the wisdom of crowds theory,” IEEE Transactions on Cybernetics, in press, 2017.
 [27] D. Huang, J.H. Lai, C.D. Wang, and P. C. Yuen, “Ensembling oversegmentations: From weak evidence to strong segmentation,” Neurocomputing, vol. 207, pp. 416–427, 2016.
 [28] X. Wang, C. Yang, and J. Zhou, “Clustering aggregation by probability accumulation,” Pattern Recognition, vol. 42, no. 5, pp. 668–675, 2009.
 [29] J. Yi, T. Yang, R. Jin, and A. K. Jain, “Robust ensemble clustering by matrix completion,” in Proc. of IEEE International Conference on Data Mining (ICDM), 2012.
 [30] A. Strehl and J. Ghosh, “Cluster ensembles: A knowledge reuse framework for combining multiple partitions,” Journal of Machine Learning Research, vol. 3, pp. 583–617, 2003.
 [31] X. Z. Fern and C. E. Brodley, “Solving cluster ensemble problems by bipartite graph partitioning,” in Proc. of International Conference on Machine Learning (ICML), 2004.
 [32] A. Topchy, A. K. Jain, and W. Punch, “Clustering ensembles: models of consensus and weak partitions,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 12, pp. 1866–1881, 2005.
 [33] L. Franek and X. Jiang, “Ensemble clustering by means of clustering embedding in vector spaces,” Pattern Recognition, vol. 47, no. 2, pp. 833–842, 2014.
 [34] A. K. Jain, “Data clustering: 50 years beyond means,” Pattern Recognition Letters, vol. 31, no. 8, pp. 651–666, 2010.
 [35] G. Karypis and V. Kumar, “A fast and high quality multilevel scheme for partitioning irregular graphs,” SIAM Journal on Scientific Computing, vol. 20, no. 1, pp. 359–392, 1998.
 [36] E. Weiszfeld and F. Plastria, “On the point for which the sum of the distances to n given points is minimum,” Annals of Operations Research, vol. 167, no. 1, pp. 7–41, 2009.
 [37] F. R. Kschischang, B. J. Frey, and H.A. Loeliger, “Factor graphs and the sumproduct algorithm,” IEEE Transactions on Information Theory, vol. 47, no. 2, pp. 498–519, 2001.
 [38] X. Yin, S. Chen, E. Hu, and D. Zhang, “Semisupervised clustering with metric learning: An adaptive kernel method,” Pattern Recognition, vol. 43, no. 4, pp. 1320–1333, 2010.
 [39] W. Zhang, Z. Lin, and X. Tang, “Learning semiriemannian metrics for semisupervised feature extraction,” IEEE Transactions on Knowledge and Data Engineering, vol. 23, no. 4, pp. 600–611, 2011.
 [40] Q. Wang, P. C. Yuen, and G. Feng, “Semisupervised metric learning via topology preserving multiple semisupervised assumptions,” Pattern Recognition, vol. 46, no. 9, pp. 2576–2587, 2013.
 [41] H. Wang, F. Nie, and H. Huang, “Robust distance metric learning via simultaneous l1norm minimization and maximization,” in Proc. of International Conference on Machine Learning (ICML), vol. 32, no. 2, 2014, pp. 1836–1844.
 [42] S. Anand, S. Mittal, O. Tuzel, and P. Meer, “Semisupervised kernel mean shift clustering,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, no. 6, pp. 1201–1215, 2014.
 [43] B. Wang, A. M. Mezlini, F. Demir, M. Fiume, Z. Tu, M. Brudno, B. HaibeKains, and A. Goldenberg, “Similarity network fusion for aggregating data types on a genomic scale,” Nature Methods, vol. 11, pp. 333–337, 2014.
 [44] U. von Luxburg, “A tutorial on spectral clustering,” Statistics and Computing, vol. 17, no. 4, pp. 395–416, 2007.
 [45] M. C. de Souto, I. G. Costa, D. S. de Araujo, T. B. Ludermir, and A. Schliep, “Clustering cancer gene expression data: A comparative study,” BMC bioinformatics, vol. 9, no. 1, p. 497, 2008.
 [46] D. B. Graham and N. M. Allinson, “Characterising virtual eigensignatures for general purpose face recognition,” in Face Recognition. Springer, 1998, pp. 446–456.
 [47] K. Bache and M. Lichman, “UCI machine learning repository,” 2017. [Online]. Available: http://archive.ics.uci.edu/ml
 [48] M.E. Nilsback and A. Zisserman, “A visual vocabulary for flower classification,” in Proc. of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2, 2006, pp. 1447–1454.
 [49] S. A. Nene, S. K. Nayar, H. Murase et al., “Columbia object image library (COIL20),” Technical report CUCS00596, 1996.
 [50] S. Roweis, http://www.cs.nyu.edu/%7eroweis/data.html.
 [51] N. X. Vinh, J. Epps, and J. Bailey, “Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance,” Journal of Machine Learning Research, vol. 11, no. 11, pp. 2837–2854, 2010.
Dong Huang received the B.S. degree in computer science in 2009 from South China University of Technology, Guangzhou, China. He received the M.Sc. degree in computer science in 2011 and the Ph.D. degree in computer science in 2015, both from Sun Yatsen University, Guangzhou, China. He joined South China Agricultural University in 2015, where he is currently an Associate Professor with the College of Mathematics and Informatics. Since July 2017, he has been a visiting fellow with the School of Computer Science and Engineering, Nanyang Technological University, Singapore. His research interests include data mining and pattern recognition. He is a member of the IEEE. 
ChangDong Wang received the B.S. degree in applied mathematics in 2008, the M.Sc. degree in computer science in 2010, and the Ph.D. degree in computer science in 2013, all from Sun Yatsen University, Guangzhou, China. He was a visiting student at the University of Illinois at Chicago from January 2012 to November 2012. He is currently an Associate Professor with the School of Data and Computer Science, Sun Yatsen University, Guangzhou, China. His current research interests include machine learning and data mining. He has published more than 40 scientific papers in international journals and conferences such as IEEE TPAMI, IEEE TKDE, IEEE TSMCC, Pattern Recognition, KAIS, Neurocomputing, ICDM and SDM. His ICDM 2010 paper won the Honorable Mention for Best Research Paper Awards. He was awarded 2015 Chinese Association for Artificial Intelligence (CAAI) Outstanding Dissertation. He is a member of the IEEE. 
JianHuang Lai received the M.Sc. degree in applied mathematics in 1989 and the Ph.D. degree in mathematics in 1999 from Sun Yatsen University, China. He joined Sun Yatsen University in 1989 as an Assistant Professor, where he is currently a Professor with the School of Data and Computer Science. His current research interests include the areas of digital image processing, pattern recognition, multimedia communication, wavelet and its applications. He has published more than 200 scientific papers in the international journals and conferences on image processing and pattern recognition, such as IEEE TPAMI, IEEE TKDE, IEEE TNN, IEEE TIP, IEEE TSMCB, Pattern Recognition, ICCV, CVPR, IJCAI, ICDM and SDM. Prof. Lai serves as a Standing Member of the Image and Graphics Association of China, and also serves as a Standing Director of the Image and Graphics Association of Guangdong. He is a senior member of the IEEE. 
CheeKeong Kwoh received the bachelor’s degree in electrical engineering (first class) and the master’s degree in industrial system engineering from the National University of Singapore in 1987 and 1991, respectively. He received the PhD degree from the Imperial College of Science, Technology and Medicine, University of London, in 1995. He has been with the School of Computer Science and Engineering, Nanyang Technological University (NTU) since 1993. He is the programme director of the MSc in Bioinformatics programme at NTU. His research interests include data mining, soft computing and graphbased inference; applications areas include bioinformatics and biomedical engineering. He has done significant research work in his research areas and has published many quality international conferences and journal papers. He is an editorial board member of the International Journal of Data Mining and Bioinformatics; the Scientific World Journal; Network Modeling and Analysis in Health Informatics and Bioinformatics; Theoretical Biology Insights; and Bioinformation. He has been a guest editor for many journals such as Journal of Mechanics in Medicine and Biology, the International Journal on Biomedical and Pharmaceutical Engineering and others. He has often been invited as an organizing member or referee and reviewer for a number of premier conferences and journals including GIW, IEEE BIBM, RECOMB, PRIB, BIBM, ICDM and iCBBE. He is a member of the Association for Medical and BioInformatics, Imperial College Alumni Association of Singapore. He has provided many services to professional bodies in Singapore and was conferred the Public Service Medal by the President of Singapore in 2008. 