Multiple Partitions Aligned Clustering
Multi-view clustering is an important yet challenging task due to the difficulty of integrating the information from multiple representations. Most existing multi-view clustering methods explore the heterogeneous information in the space where the data points lie. Such common practice may cause significant information loss because of unavoidable noise or inconsistency among views. Since different views admit the same cluster structure, the natural space should be all partitions. Orthogonal to existing techniques, in this paper, we propose to leverage the multi-view information by fusing partitions. Specifically, we align each partition to form a consensus cluster indicator matrix through a distinct rotation matrix. Moreover, a weight is assigned for each view to account for the clustering capacity differences of views. Finally, the basic partitions, weights, and consensus clustering are jointly learned in a unified framework. We demonstrate the effectiveness of our approach on several real datasets, where significant improvement is found over other state-of-the-art multi-view clustering methods.
As an important problem in machine learning and data mining, clustering has been extensively studied for many years . Technology advances have produced large volumes of data with multiple views. Multi-view features depict the same object from different perspectives, thereby providing complementary information. To leverage the multi-view information, multi-view clustering methods have drawn increasing interest in recent years . Due to its unsupervised learning nature, multi-view clustering is still a challenging task. The key question is how to reach a consensus of clustering among all views.
In the clustering field, two dominating methods are k-means  and spectral clustering . Numerous variants of them have been developed over the past decades [3, 16, 26, 9]. Among them, some can tackle multi-view data, e.g., multi-view kernel k-means (MKKM) , robust multi-view kernel k-means (RMKKM) , Co-trained multi-view spectral clustering (Co-train) , Co-regularized multi-view spectral clustering (Co-reg) . Along with the development of nonnegative matrix factorization (NMF) technique, multi-view NMF also gained a lot of attention. For example, a multi-manifold regularized NMF (MNMF) is designed to preserve the local geometrical structure of the manifolds for multi-view clustering .
Recently, subspace clustering method has shown impressive performance. Subspace clustering method first obtains a graph, which reveals the relationship between data points, then applies spectral clustering to achieve the embedding of original data, finally utilizes k-means to obtain the final clustering result [4, 10]. Inspired by it, subspace clustering based multi-view clustering methods [5, 28, 6] have become popular in recent years. For instance, Gao et al. proposed multi-view subspace clustering (MVSC) method . In this approach, multiple graphs are constructed and they are forced to share the same cluster pattern. Therefore, the final clustering is a negotiated result and it might not be optimal.  supposes that each graph should be close to each other. After obtaining graphs, their average is utilized to perform spectral clustering. The averaging strategy might be too simple to fully take advantage of heterogeneous information. Furthermore, it is a two-stage algorithm. The constructed graph might not be optimal for the subsequent clustering .
By contrast, another class of graph-based multi-view clustering method learns a common graph based on adaptive neighbors idea [18, 27]. In specific, is connected to with probability . should have a large value if the distance between and is small. Otherwise, should be small. Therefore, obtained is treated as the similarity between and . In , each view shares the same similarity graph. Moreover, a weight for each view is automatically assigned based on loss value. Though this approach has shown its competitiveness, one shortcoming of it is that it fails to consider the flexible local manifold structures of different views.
Although proved to be effective in many cases, existing graph-based multi-view clustering methods are limited in several aspects. First, they integrate the multi-view information in the feature space via some simple strategies. Due to the generally unavoidable noise in the data representation, the graphs might be severely damaged and cannot represent the true similarities among data points . It would make more sense if we directly reach consensus clustering in partition space where a common cluster structure is shared by all views, while the graphs might be quite different for different views. Hence, partitions from various views might be less affected by noise and easier to reach an agreement. Second, most existing algorithms follow a multi-stage strategy, which might degrade the final performance. For example, the learned graph might not be suitable for the subsequent clustering task. A joint learning method is desired for this kind of problem.
Regarding the problems mentioned above, we propose a novel multiple Partitions Aligned Clustering (mPAC) method. Fig. 1 shows the idea of our approach. mPCA performs graph construction, spectral embedding, and partitions integration via joint learning. In particular, an iterative optimization strategy allows the consensus clustering to guide the graph construction, which later contributes to a new unified clustering. To sum up, we have our two-fold contributions as follows:
Orthogonal to existing multi-view clustering methods, we integrate multi-view information in partition space. This change in paradigm accompanies several benefits.
An end-to-end single stage model is developed to achieve from graph construction to final clustering. Especially, we assume that the unified clustering is reachable for each view through a distinct transformation. Moreover, the output of our algorithm is the discrete cluster indicator matrix, thus no more subsequent step is needed.
Notations In this paper, matrices and vectors are represented by capital and lower-case letters, respectively. For , and represents the -th row and -th column of , respectively. The -norm of vector is defined as , where means transpose. denotes the trace of . denotes the Frobenius norm of . Vector indicates its elements are all ones. refers to the identity matrix with a proper size. represents the set of indicator matrices. We use the superscript or subscript to denote the -th view of interchangeably when convenient.
2 Subspace Clustering Revisited
In general, for data with features and samples, the popular subspace clustering method can be formulated as:
where is a balance parameter and is some regularization function, which varies in different algorithms . For simplicity, we just apply the Frobenius norm in this paper. is the vector consists of diagonal elements of . is treated as the affinity graph. Therefore, once is obtained, we can implement spectral clustering algorithm to obtain the clustering results, i.e.,
where is the Laplacian of graph and is the spectral embedding and is number of clusters. Graph Laplacian is defined by , where is a diagonal matrix with . Since is not discrete, k-means is often used to recover the indicator matrix .
When data of multiple views are available, Eq. (2) can be extended to this scenario accordingly. denotes the data with views, where represents the -th view data with features. Basically, most methods in the literature solve the following problem
where represents some strategy to obtain a consensus graph . For example,  enforces each graph to share the same ;  penalizes the discrepancy between graphs, then their average is used as input to spectral clustering.
We observe that there are several drawbacks shared by these approaches. First and foremost, they still lack an effective way to integrate multi-view knowledge while simultaneously considering the heterogeneity among views. Simply taking the average of graphs or assigning a unique spectral embedding is not enough to take full advantage of rich information. The graph representation itself might not be optimal to characterize the multi-view information. Secondly, they adopt a multi-stage approach. Since there is no mechanism to ensure the quality of learned graphs, this approach might lead to sub-optimal clustering results, which often occurs when noise exists. To address the above-mentioned challenging issues, we propose a multiple Partitions Aligned Clustering (mPAC) method.
3 Proposed Approach
Unlike Eq.(3), which learns a unique graph based on multiple graphs , we propose to learn a partition for each graph. In specific, we adopt a joint learning strategy and formulate our objective function as
Next, we propose a way to fuse the multi-view information in the partition space. For multi-view clustering, a shared cluster structure is assumed. It is reasonable to assume a cluster indicator matrix for all views. Unfortunately, ’s elements are continuous. The discrepancy also exists among ’s. Thus, it is challenging to integrate multiple s. To recover the underlying cluster , we assume that each partition is a perturbation of and it can be aligned with through a rotation [12, 19]. Mathematically, it can be formulated as
where represents an orthogonal matrix. Eq. (5) treats each view equally. As shown by many researchers, it is necessary to distinguish their contributions. Therefore, we introduce a weight parameter for view . Deploying a unified framework, we eventually reach our objective for mPAC as
We can observe that the proposed approach is distinct from other methods in several aspects:
Orthogonal to existing multi-view clustering techniques, Eq. (6) integrates heterogeneous information in partition space. Considering that a common cluster structure is shared by all views, it would be natural to perform information fusion based on partitions.
Generally, learning with multi-stage strategy often leads to sub-optimal performance. We adopt a joint learning framework. The learning of similarity graphs, spectral embeddings, view weights, and unified cluster indicator matrix is seamlessly integrated together.
is the final discrete cluster indicator matrix. Hence, discretization procedure is no longer needed. This eliminates the k-means post-processing step, which is sensitive to initialization. With input , (6) will output the final discrete . Thus, it is an end-to-end single-stage learning problem.
Multiple graphs are learned in our approach. Hence, the local manifold structures of each view are well taken care of.
As a matter of fact, Eq. (6) is not a simple unification of the pipeline of steps and it attempts to learn graphs with optimal structure for clustering. According to the graph spectral theory, the ideal graph is -connected if there are clusters . In other words, the Laplacian matrix has zero eigenvalues s. Approximately, we can minimize , which is equivalent to . Hence, the third term in Eq. (6) ensures that each graph is optimal for clustering.
4 Optimization Methods
To handle the objective function in Eq. (6), we apply an alternating minimization scheme to solve it.
4.1 Update for Each View
By fixing other variables, we solve according to
It can be seen that each is independent from other views. Therefore, we can solve each view separately. To simplify the notations, we ignore the view index tentatively. Note that is a function of and . Equivalently, we solve
where with the -th component defined by . By setting its first-order derivative to zero, we obtain
4.2 Update for Each View
Similarly, we drop all unrelated terms with respect to and ignore the view indexes. It yields,
This sub-problem can be efficiently solved based on the method developed in .
4.3 Update for Each View
With respect to , the objective function is additive. We can solve each individually. Specifically,
its closed-form solution is , where , are the left and right unitary matrix of the SVD decomposition of , respectively .
For , we get
Let’s unfold above objective function, we have
Thus, we can equivalently solve
It admits a closed-form solution, that is, ,
4.5 Update for Each View
Let’s denote as , then this subproblem can be expressed as
Based on Cauchy-Schwarz inequality, we have
The minimum, which is a constant, is achieved when . Thus, the optimal is given by, ,
5.1 Experimental Setup
We conduct experiments on four benchmark data sets: BBCSport, Caltech7, Caltech20, Handwritten Numerals. Their statistics information is summarized in Table 1. We compare the proposed mPAC with several state-of-the-art methods from different categories, including Co-train , Co-reg , MKKM , RMKM , MVSC , MNMF , parameter-free auto-weighted multiple graph learning (AMGL) . Furthermore, the classical k-means (KM) method with concatenated features (i.e., all features, AllFea in short) is included as a baseline. That is to say, all views are of the same importance. Following , all values of each view are normalized into range . To achieve a comprehensive evaluation, we apply five widely-used metrics to examine the effectiveness of our method: F-score, precision, Recall, Normalized Mutual Information (NMI), and Adjusted Rand Index (ARI). We initialize our algorithm by using the results from .
5.2 Experimental Results
We repeat each method 10 times and report their mean and standard deviation (std) values. For our proposed method, we only need to implement once since no k-means is involved. The clustering performance on those four data sets is summarized in Tables 5-5, respectively. We can observe that our mPAC method achieves the best performance in most cases, which validates the effectiveness of our approach. In general, our method works better than k-means and NMF based techniques. Furthermore, it can be seen that the improvement is remarkable. With respect to graph-based clustering methods, our approach also demonstrates its superiority. In particular, both MVSC and AMGL assume that all graphs produce the same partition, while our method learns one partition for each view and finds the underlying cluster by aligning mechanism.
To visualize the effect of partitions alignment, we implement t-SNE on the clustering results of Handwritten Numerals data. As shown in Fig. 2, some partitions have a good cluster structure, thus it might be easy to find a good . On the other hand, although the partition of view 5 is bad, we can still achieve a good solution . This indicates that our method is reliable to obtain a good clustering since it operates in the partition space. By contrast, previous methods may not consistently provide a good solution.
5.3 Sensitivity Analysis
Taking Caltech7 as an example, we demonstrate the influence of parameters to clustering performance. From Fig. 3, we can observe that our performance is quite stable under a wide range of parameter settings. In particular, it becomes more robust to and when increases, which indicates the importance of partition alignment.
In this paper, a novel multi-view clustering method is developed. Different from existing approaches, it seeks to integrate multi-view information in partition space. We assume that each partition can be aligned to the consensus clustering through a rotation matrix. Furthermore, graph learning and clustering are performed in a unified framework, so that they can be jointly optimized. The proposed method is validated on four benchmark data sets.
This paper was in part supported by Grants from the Natural Science Foundation of China (Nos. 61806045 and 61572111), two Fundamental Research Fund for the Central Universities of China (Nos. ZYGX2017KYQD177 and A03017023701012) and a 985 Project of UESTC (No. A1098531023601041) .
-  (2013) Multi-view k-means clustering on big data.. In IJCAI, pp. 2598–2604. Cited by: §1, §5.1.
-  (2017) A survey on multi-view clustering. arXiv preprint arXiv:1712.06246. Cited by: §1.
-  (2013) TW-k-means: Automated Two-level Variable Weighting Clustering Algorithm for Multi-view Data. IEEE Transactions on Knowledge and Data Engineering 25 (4), pp. 932–944. External Links: Cited by: §1.
-  (2013) Sparse subspace clustering: algorithm, theory, and applications. IEEE transactions on pattern analysis and machine intelligence 35 (11), pp. 2765–2781. Cited by: §1.
-  (2015) Multi-view subspace clustering. In ICCV, pp. 4238–4246. Cited by: §1, §2, §5.1.
-  (2019) Auto-weighted multi-view clustering via kernelized graph learning. Pattern Recognition 88, pp. 174–184. Cited by: §1.
-  (2018) Self-weighted multi-view clustering with soft capped norm. Knowledge-Based Systems 158, pp. 1–8. Cited by: §5.1.
-  (2010) Data clustering: 50 years beyond k-means. Pattern recognition letters 31 (8), pp. 651–666. Cited by: §1, §1.
-  (2018) Self-weighted multiple kernel learning for graph-based clustering and semi-supervised classification. In IJCAI, pp. 2312–2318. Cited by: §1.
-  (2019) Similarity learning via kernel preserving embedding. In AAAI, Cited by: §1.
-  (2019) Robust graph learning from noisy data. IEEE Transactions on Cybernetics, pp. 1–11. External Links: Cited by: §1.
-  (2018) Unified spectral clustering with optimal graph. In AAAI, Cited by: 5th item, §3.
-  (2017) Twin learning for similarity and clustering: a unified kernel approach. In AAAI, Cited by: §1.
-  (2011) A co-training approach for multi-view spectral clustering. In ICML, pp. 393–400. Cited by: §1, §5.1.
-  (2011) Co-regularized multi-view spectral clustering. In NIPS, pp. 1413–1421. Cited by: §1, §5.1.
-  (2018-10) Partition level constrained clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence 40 (10), pp. 2469–2483. External Links: Cited by: §1.
-  (2002) On spectral clustering: analysis and an algorithm. NIPS 2, pp. 849–856. Cited by: §1.
-  (2016) Parameter-free auto-weighted multiple graph learning: a framework for multiview clustering and semi-supervised classification.. In IJCAI, pp. 1881–1887. Cited by: §1, §5.1.
-  (2018) Multiview clustering via adaptively weighted procrustes. In SIGKDD, pp. 2022–2030. Cited by: §3.
-  (2016) The constrained laplacian rank algorithm for graph-based clustering. In AAAI, Cited by: §5.1.
-  (2018) Connections between nuclear-norm and frobenius-norm-based representations. IEEE transactions on neural networks and learning systems 29 (1), pp. 218–224. Cited by: §2.
-  (1966) A generalized solution of the orthogonal procrustes problem. Psychometrika 31 (1), pp. 1–10. Cited by: Lemma 1.
-  (2012) Kernel-based weighted multi-view clustering. In ICDM, pp. 675–684. Cited by: §1, §5.1.
-  (2016) Iterative views agreement: an iterative low-rank based structured optimization method to multi-view spectral clustering. In IJCAI, pp. 2153–2159. Cited by: §1, §2.
-  (2013) A feasible method for optimization with orthogonality constraints. Mathematical Programming 142 (1-2), pp. 397–434. Cited by: §4.2.
-  (2018) Fast spectral clustering learning with hierarchical bipartite graph for large-scale data. Pattern Recognition Letters. Cited by: §1.
-  (2017) Graph learning for multiview clustering. IEEE transactions on cybernetics (99), pp. 1–9. Cited by: §1.
-  (2017) Latent multi-view subspace clustering. In CVPR, pp. 4279–4287. Cited by: §1.
-  (2017) Multi-view clustering via multi-manifold regularized non-negative matrix factorization. Neural Networks 88, pp. 74–89. Cited by: §1, §5.1.