Beyond LowRank Representations: Orthogonal Clustering Basis Reconstruction with Optimized Graph Structure for Multiview Spectral Clustering
Abstract
LowRank Representation (LRR) is arguably one of the most powerful paradigms for Multiview spectral clustering, which elegantly encodes the multiview local graph/manifold structures into an intrinsic lowrank selfexpressive data similarity embedded in highdimensional space, to yield a better graph partition than their singleview counterparts. In this paper we revisit it with a fundamentally different perspective by discovering LRR as essentially a latent clustered orthogonal projection based representation winged with an optimized local graph structure for spectral clustering; each column of the representation is fundamentally a cluster basis orthogonal to others to indicate its members, which intuitively projects the viewspecific feature representation to be the one spanned by all orthogonal basis to characterize the cluster structures. Upon this finding, we propose our technique with the followings: (1) We decompose LRR into latent clustered orthogonal representation via lowrank matrix factorization, to encode the more flexible cluster structures than LRR over primal data objects; (2) We convert the problem of LRR into that of simultaneously learning orthogonal clustered representation and optimized local graph structure for each view; (3) The learned orthogonal clustered representations and local graph structures enjoy the same magnitude for multiview, so that the ideal multiview consensus can be readily achieved. The experiments over multiview datasets validate its superiority, especially over recent stateoftheart LRR models.
Beyond LowRank Representations: Orthogonal Clustering Basis Reconstruction with Optimized Graph Structure for Multiview Spectral Clustering
Yang Wang, Lin Wu The University of New South Wales, Kensington, Sydney, Australia The University of Queensland, Australia wangy@cse.unsw.edu.au; lin.wu@uq.edu.au
1 Introduction
Spectral clustering [?], which partitions the data objects via their local graph/manifold structure relying on the Laplacian eigenvalueeigenvector decomposition, is one fundamental clustering problem. Unlike KMeans clustering [?], the data objects within the same group characterize not only the large data similarity but also the similar local graph/manifold structure. With the rapid development of information technology, the data are largely available with the multiview feature representations (e.g., images can be featured by a color histogram view or a texture view), which naturally paves the way to multiview spectral clustering. As extensively claimed by the multiview research [?; ?; ?; ?; ?; ?; ?; ?], the information encoded by multiview features describe different properties; thus leveraging the multiview information can outperform the singleview counterparts. One critical issue on a successful multiview incorporation implied by the existing work [?; ?; ?; ?; ?], lies in how to achieve the multiview consensus/agreement.
Following such principle, a lot of multiview clustering methods [?; ?] claim that similar data objects should be within the same group across all views. Based on that, the consensus multiview local manifold structure is further explored with great efforts [?; ?; ?; ?] for multiview spectral clustering. Among all these methods, LowRank Representation (LRR) [?] coupled with sparse decomposition based model has been emerged as a substantially elegant solution, due to its strength of exploring their intrinsic lowdimensional manifold structure encoded by the data correlations embedded in highdimensional space, while exhibiting strong robustness to feature noise corruptions addressed by sparse noise modeling, hence attracting great attention.
1.1 Motivation: LRR Revisited for MultiView Spectral Clustering
Specifically, the typical LRR model for multiview spectral clustering stems from the formulation below:
(1)  
where is the data representation for the view with as its feature dimension, as the number of data objects identical for each view, is the balance parameter, and is the view set. is the selfexpressive lowrank similarity representation shared by all views, constrained with based on , which can also be substituted by the other specific dictionaries; is modeled to address the noisecorruption for the viewspecific feature representation. ensures the nonnegativity for all its entries. Based on such optimized lowrank , the spectral clustering is finally conducted. One significant limitation of Eq.(1) pointed out by [?] is that, only one common is learned to preserve the flexible local manifold structures for all views, hence fails to achieve the ideal spectral clustering result.
To this end, various lowrank are learned to preserve the viewspecific local manifold structures, meanwhile minimize their divergence via an iterativeviewsagreement strategy for multiview consensus, followed by a final spectral clustering stage.
Despite its encouraging performance, the following standout limitations are inattentively overlooked for LRR model: (1) The lowrank data similarity may not well encode the flexible latent cluster structures over primal viewspecific feature space; worse still for the nonideal local graph construction over such representation for spectral clustering; (2) The lowrank data similarities coming from multiviews may not be within the same magnitude, so that the divergence minimization may not achieve the ideal multiview clustering consensus.
Our new perspective. The above facts motivate us to revisit the lowrank representation to help reconstruct below for the view
(2) 
where denotes the set of with lowrankness e.g., cluster number far less than ; Instead of narrowing LowRank as selfexpressive data similarity from the conventional viewpoint, it is essentially seen as a special case of a generalized LowRank projection, to map feature representation to a lowdimensional space to reconstruct with minimum error. As discussed, the selfexpressive similarity projection equipped with LRR models still suffer from the aforementioned nontrivial limitations.
Here we ask a question: Is there a superior lowrank projection to minimize Eq.(2), meanwhile address the limitations over the existing LRR models. Our answer to this question is positive. Specifically, we propose to consider as a latent clustered orthogonal projection, via , where

Clustered orthogonal projection: , where each column indicates one cluster to characterize its belonging data objects. Compared with LRR over original feature space, the latent factor can better preserve the flexible latent cluster structure.

Feature reconstruction with cluster basis: Instead of lowrank data similarity, essentially serves as a mapping to reconstruct the viewspecific features via the column of to encode the latent cluster structures.

Rethinking : We revisit the intuition of via throughout two stages, remind that where

is performed to obtain the new projection value for all features over orthogonal columns of ;

is subsequently the projected representation for all features spanned by clustered orthogonal column basis of .


Same magnitude for multiview consensus: All enjoy the same magnitude due to their orthonormal columns. Hence, the feasible divergence minimization will facilitate the multiview consensus.
Before shedding light on our technique, we review the typical related work for multiview spectral clustering
1.2 Prior Arts
The prior arts can be classified as per the strategy at which the multiview fusion takes place for spectral clustering.
The most straightforward method goes to the Early fusion [?] by concatenating the multiview feature vectors with equal or varied weights into an unified one, followed by the spectral clustering over such unified space. However, such method ignores the statistical property belonging to an individual view. Late fusion [?] may address the limitation to some extents by aggregating the spectral clustering result from each individual view, which follows the assumption that all views are independent to each other. Such assumption is not effective for multiview spectral clustering as they assume the views to be dependent so that the multiview consensus information can be exploited for promising performance.
Canonical Correlation Analysis (CCA) is applied for multiview spectral clustering [?] by learning a common lowdimensional representations for all views, upon which the spectral clustering is performed. One salient drawback lies in the failure of preserving the flexible local manifold structures for different views via such common subspace. Cotraining based model [?] learned the Laplacian eigenmap for each view over its projected data representation throughout the laplacian eigenmaps from other views, such process repeated till the convergence, the final similarity are then aggregated for spectral clustering. A similar method [?] is also proposed to coordinate multiview laplacian eigenmaps consensus for spectral clustering. Despite their effectiveness, they have to follow the scenario of noise free for the feature representations. Unfortunately, it cannot be met in practice. The LowRank Representation and sparse decomposition models [?; ?] well tackle the problem, meanwhile exhibits the robustness to feature noise corruptions. However, they still suffer from the aforementioned limitations. To this end, we make the following orthogonal contributions to typical LRR model for multiview spectral clustering.
1.3 Our Contributions

We revisit the classical LowRank Representation (LRR) for multiview spectral clustering with a fundamentally novel viewpoint of finding it as essentially the latent clustered orthogonal projection based representation with optimized graph structure, to better encode the flexible latent cluster structures than LRR over primal data objects.

We convert the problem of learning LRR into that of simultaneously learning the clustered orthogonal representation and its optimized local graph structure for each view, rather than directly rely on the local graph construction over original data objects.

The learned multiview latent clustered representations and local graph structures enjoy the same magnitude, so as to facilitate a feasible divergence minimization to achieve superior multiview consensus for spectral clustering.
Extensive experiments over multiview datasets validate the superiority of our method.
2 Learning Clustered Orthogonal projection with Optimized Graph Structure
In this section, we formally discuss our technique. Some notations that are used throughout the paper are shown below.
2.1 Notations
For Matrix , the trace of is denoted as ; (or for vector space) denotes the Frobenius norm; is the norm, and denotes the transpose of , and its unclear norm as (sum of all singular values); and as the row and column of . means all entries of are nonnegative. is the identity matrix with adaptive size. 1 indicates the vector of adaptive length with all entries to be 1. indicates the cardinality of the set.
2.2 Problem Formulation
As previously defined in section 1.1, is the data representation for the view. is lowrank data similarity representation for . The Eq.(2) is equivalent to computing such that , where has orthonormal columns with its column representing the relevance each data object belongs to the cluster, and indicates latent cluster number. We then arrive at the following
(3) 
As discussed in section 1.1, reveals the new projection representation for all features spanned by the orthogonal basis of to reconstruct .
Optimizing Eq.(3) w.r.t. is equivalent to computing an SVD of to constitute the orthogonal columns of using the principle eigenvectors. Inspired by this, we exploit the latent cluster structures of to form nonoverlapping clusters with each characterized by one orthogonal column basis of . Thanks to [?] on lowrank matrix factorization, it yields the following
(4) 
where and are latent factors from . Based on that, we approximate via the clustered orthogonal projection factorization , and convert the problem of minimizing to that of learning clustered projection representation below
(5) 
Remark 1
Unlike the data similarity over raw data objects, via the lowrank matrix factorization can achieve the flexible latent cluster structures. Another crucial issue left to be addressed lies in its local manifold/graph structure modeling over , which is crucial for spectral clustering. One may directly refer to the local graph construction over . However, as previously stated, it cannot effectively encode the the local graph structure over .
Towards this end, we propose to learn an optimized local graph structure over by solving the following
(6)  
where is Laplacian matrix, is the diagonal matrix with its diagonal entry equaled to the sum of the row of . The ideal reveals the probability of and data points within the same cluster according to cluster projection representation . We impose the constraint that , and to meet the probability nature of . Following [?], we will impose the regularization to avoid that only the nearest neighbor of each data point is assigned 1 with others 0.
With all the above collected, we finally formulate the problem below
(7)  
where is omitted due to constraint ; all the share the same cluster number for multiview clustering consensus. , and are nonnegative weights related to learning the clustered orthogonal representation, its local graph structure and multiview consensus modeling, and will be studied in Section 4. The constraint ensures the orthonormal columns of .
Remark 2
We introduce two auxiliary variables and . As will be shown later, the intuition of introducing lies in minimizing w.r.t. , where

it is similar as dictionary learning, while popping up as the corresponding sparse representation learning; moreover, it also enjoys the optimization of the isolated after merging the other into .
3 Optimization
Solving Eq.(7) is equivalent to be a unified process of simultaneously learning and for the view. As will be shown later, learning either of them will promote the other. Optimizing Eq.(7) is not jointly convex to , and , we hence alternately optimize each of them with the others fixed. Following [?], we deploy the Augmented Lagrange Multiplier (ALM) together with Alternating Direction Minimization (ADM) strategy, which is widely known as an effective and efficient solver. As the optimization process for the above variables within each view is similar, we only present the optimization process for the view, the same process holds for other views. The augmented lagrangian function can be written below
(8)  
where , , and are Lagrange multipliers. indicates elementwise multiplication. is a penalty parameter.
Solving : We calculate the partial derivative of Eq.(8) w.r.t. , to be , while fixing others to be constant. After rearranging the terms, it has
(9) 
where
Efficient Row updating strategy of . As shown in Eq.(9), the bottleneck of updating lies in the high computational complexity of caused by the matrix inverse operation against the . To resolve it, we propose to update each row of . Without loss of generality, we set the derivative w.r.t. to be . It then yields the following
(10) 
where
(11) 
Orthonormalize : After obtaining the whole by updating all rows for each iteration,
the clustering algorithm e.g., fast kmeans is performed, which yields the cluster indicator for each data point/each row, leading to orthogonal columns
then normalize each entry of via the rules as: if is assigned with the cluster , it is 0 otherwise. According to the processing above, it successfully achieves the orthonormal columns of ().
Remark 3
As per the rowupdate strategy for in Eq.(10), we remark the followings:

We dramatically reduces the computational complexity from by Eq.(9) to , due to .

Another note goes to the process of multiview consensus of via the row update. Specifically, during each iteration, the is updated via the influence from other views, while served as a constraint to guide the updating, among all of which the divergence is decreased towards a consensus, which is based on the same magnitude among with orthonormal columns.
Solving : We get the partial derivative of Eq.(8) w.r.t. , then yields the following closed form:
(12) 
The major computational burden lies in , resulting into , which is identical to that for rowupdating of , hence efficient.
Solving :
Optimizing Eq.(8) w.r.t. is equivalent to solving the following
(13) 
According to [?], the following closed form can be obtained
(14) 
where , if is positive, it is 0 otherwise.
Solving :
Optimizing Eq.(8) w.r.t. is equivalent to the following
(15) 
Based on that, we enjoy the following closed form
(16) 
Solving : The problem of optimizing can be converted to the following
(17)  
As the similarity vector for each sample is independent, we only study the sample.
(18)  
We convert Eq.(18) to the following
(19) 
where is a vector, with its entry , leading to the following closed form:
(20) 
where turns the negative entries in to 0 while with positive entries remained. denotes the number of data points that have nonzero weight connected to the sample. We empirically set for all views. Once the is obtained, we may update that to be a balanced undirected graph as .
Consensus : As is solely determined by according to Eq.(20), the consensus on in Remark 3 naturally leads to the consensus over .
Multiplier updating: The lagrange multipliers , and are automatically updated as
(21)  
Besides, is tuned via the adaptive updating rule according to [?].
Algorithm convergence: It is worth nothing that ADM strategy converges to a stationary point yet no guaranteed to be global optimum. Upon that, we define the convergence when with or maximum iteration number is reached, which is set to be 25 for our method.
The optimization process is conducted regarding each variable alternatively within each view, the entire process is terminated until the convergence rule is met for all views.
Multiview clustering output: After the above updating rule is converged, we got the final multiview clustered representation ; and multiview optimized local graph structure . The normalized graph cut is applied to generate the clusters as the multiview spectral clustering output.
We summarize the whole updating process in Algorithm 1.
4 Experimental Validation
The following multiview data sets and their viewspecific features are selected according to [?; ?].

UCI handwritten Digit set^{1}^{1}1http://archive.ics.uci.edu/ml/datasets/Multiple+Features: It consists of features of handwritten digits (09). The dataset is described by 6 features and contains 2000 samples with 200 in each category. Analogous to [?], we choose 76 Fourier coefficients (FC) of the character shapes and the 216 profile correlations (PC) as two views.
Method UCI digits AwA NUS LRRGL 17.39 25.78 34.21 Ours 1.15 1.18 1.21 Table 1: Multiview consensus ratio metric as per Eq.(LABEL:eq:metriclast) between our method and LRRGL over three data sets. Smaller value means similar magnitude. 
Animal with Attribute (AwA)^{2}^{2}2http://attributes.kyb.tuebingen.mpg.de: It consists of 50 kinds of animals described by 6 features (views): Color histogram ( CQ, 2688dim), local selfsimilarity (LSS, 2000dim), pyramid HOG (PHOG, 252dim), SIFT (2000dim), Color SIFT (RGSIFT, 2000dim), and SURF (2000dim). We randomly sample 80 images for each category and get 4000 images in total.

NUSWIDEObject (NUS) [?]: The data set consists of 30000 images from 31 categories. We construct 5 views: 65dimensional color histogram (CH), 226dimensional color moments (CM), 145dimensional color correlation (CORR), 74dimensional edge estimation (EDH), and 129dimensional wavelet texture (WT).
The following typical multiview baselines are compared for spectral clustering, covering Early fusion, Late fusion, CCA, Cotraining strategy and LRR models as reviewed in Section 1.2. All the parameters are tuned to their best performance.

MFMSC: Concatenating multiview features to perform spectral clustering.

Multiview affinity aggregation for multiview spectral clustering (MAASC) [?].

Canonical Correlation Analysis (CCA) based multiview spectral clustering (CCAMSC) [?] by learning a common subspace for multiview data, then perform spectral clustering.

Cotraining [?]: Learning multiview Laplacian eigenspace via a cotraining fashion over each individual one.

Robust LowRank Representation Method (RLRR) [?], as formulated in Eq.(1).

LowRank Representation with MultiGraph Learning (LRRGL) [?].
ACC (%)  UCI digits  AwA  NUS 

MFMSC  43.81  17.13  22.81 
MAASC  51.74  19.44  25.13 
CCAMSC  73.24  24.04  27.56 
Cotraining  79.22  29.06  34.25 
RLRR  83.67  31.49  35.27 
LRRGL  86.39  37.22  41.02 
Ours  92.22  44.55  45.78 
NMI (%)  UCI digits  AwA  NUS 

MFMSC  41.57  11.48  12.21 
MAASC  47.85  12.93  11.86 
CCAMSC  56.51  15.62  14.56 
Cotraining  62.07  18.05  18.10 
RLRR  81.20  25.57  18.29 
LRRGL  85.45  31.74  20.61 
Ours  89.61  36.67  26.42 
Clustering accuracy (ACC) and normalized mutual information (NMI). Pleaser refer to [?; ?] for their detailed descriptions. To demonstrate the robustness superiority over nonLRR methods, following [?], we set the feature corruption noise for each view is with sparse noise as 20% entries with uniformly noise over [5,5] for RLRR, LRRGL and our method, with in Eq.(7) for our method. All experiments are repeated 10 times, the average clustering results are shown in Tables 2 and 3, where our method outperforms the others, especially better than RLRR and LRRGL, due to its strengthes of

encoding more flexible latent cluster structures, along with the more ideal optimized local graph structure based on such latent clustered representation.

The superior multiview consensus in terms of both latent clustered representation and optimized local graph structure for all views.
To penetrate the first finding, we illustrate the visualized consensus multiview affinity matrix over NUS data set between our method and LRRGL in Fig. 1, which validates the advantages of our clustered orthogonal representation over lowrank similarity yielded by LRRGL.
Parameter Study: We further study the parameter (clustered orthogonal representations and optimized local graph structure) and (multiview consensus term) in Eq.(7), and against the clustering accuracy over AwA and NUS data sets; we varied one parameter while fixed the others, and the results are illustrated in Fig. 2, where increasing either of them can improve the clustering accuracy until meet the optimal pairwise values, followed by a slight performance decreasing. To balance Figs. 2(a) and (b), we finalize and in Eq.(7).
(a)  (b) 
(a)  (b) 
5 Conclusion
In this paper, we revisit the classical LowRank Representation (LRR) for multiview spectral clustering, by viewing LRR as essentially a latent clustered orthogonal projection winged with its optimized local graph structure. Following this, we propose to simultaneously learn clustered orthogonal projection and optimized local graph structure for each view, while enjoy the same magnitude over them both for all views, leading to a superior multiview spectral clustering consensus. Extensive experiments validate its strength.
References
 [Cai and Chen, 2015] Deng Cai and Xinlei Chen. Large scale spectral clustering via landmarkbased sparse representation. 45(8):1669–1680, 2015.
 [Chaudhuri et al., 2009] K. Chaudhuri, S. Kakade, K. Livescu, and K. Sridharan. Multiview clustering via canonical correlation analysis. In ICML, 2009.
 [Chen et al., 2011] WenYen Chen, Yangqiu Song, Hongjie Bai, ChihJen Lin, and Edward Y. Chang. Parallel spectral clustering in distributed systems. IEEE Trans. Pattern Anal. Mach. Intell., 33(3):568–586, 2011.
 [Chua et al., 2009] TatSeng Chua, Jinhui Tang, Richang Hong, Haojie Li, Zhiping Luo, and YanTao Zheng. Nuswide: A realworld web image database from national university of singapore. In ACM CIVR, 2009.
 [Deng et al., 2015] Cheng Deng, Zongting Lv, Wei Liu, Junzhou Huang, Dacheng Tao, and Xinbo Gao. Multiview matrix decomposition:a new scheme for exploring discriminative information. In IJCAI, 2015.
 [Gao et al., 2013] Jing Gao, Jiawei Han, Jialu Liu, and Chi Wang. Multiview clustering via joint nonnegative matrix factorization. In SDM, pages 252–260, 2013.
 [Gao et al., 2015] Hongchang Gao, Feiping Nie, Xuelong Li, and Heng Huang. Multiview subspace clustering. In ICCV, pages 4238–4246, 2015.
 [Greene and Cunningham, 2009] D. Greene and P. Cunningham. A matrix factorization approach for integrating multiple data views. In ECMLPKDD, 2009.
 [Gui et al., 2014] Jie Gui, Dacheng Tao, Zhenan Sun, Yong Luo, Xinge You, and Yuan Yan Tang. Group sparse multiview patch alignment framework with view consistency for image classification. IEEE Transactions on Image Processing, 23(7):3126–3137, 2014.
 [Huang et al., 2010] Yuchi Huang, Qingshan Liu, Shaoting Zhang, and Dimitris N. Metaxas. Image retrieval via probabilistic hypergraph ranking. In CVPR, 2010.
 [Huang et al., 2012] HsinChien Huang, YungYu Chuang, and ChuSong Chen. Affinity aggregation for spectral clustering. In CVPR, 2012.
 [Kumar and Daume, 2011] Abhishek Kumar and Hal Daume. A cotraining approach for multiview spectral clustering. In ICML, 2011.
 [Kumar et al., 2011] Abhishek Kumar, Piyush Rai, and Hal Daume. Coregularized multiview spectral clustering. In NIPS, 2011.
 [Lin et al., 2011] Zhouchen Lin, Risheng Liu, and Zhixun Su. Linearized alternating direction method with adaptive penalty for lowrank representation. In NIPS, 2011.
 [Liu et al., 2010] Guangcan Liu, Zhouchen Lin, and Yong Yu. Robust subspace segmentation by lowrank representation. In ICML, 2010.
 [Ng et al., 2001] Andrew Y. Ng, Michael I. Jordan, and Yair Weiss. On spectral clustering: Analysis and an algorithm. In NIPS, 2001.
 [Nie et al., 2014] Feiping Nie, Xiaoqian Wang, and Heng Huang. Clustering and projected clustering with adaptive neighbors. In KDD, pages 977–986, 2014.
 [Recht et al., 2008] Benjamin Recht, Maryam Fazel, and Pablo A. Parrilo. Guaranteed minimumrank solutions of linear matrix equations via nuclear norm minimization. SIAM Journal on Optimization, 20(4):1956–1982, 2008.
 [Recht et al., 2010] Benjamin Recht, Maryam Fazel, and Pablo A. Parrilo. Guaranteed minimumrank solutions of linear matrix equations via nuclear norm minimization. SIAM review, 45(8):1669–1680, 2010.
 [Wang et al., 2013] Yang Wang, Xuemin Lin, and Qing Zhang. Towards metric fusion on multiview data: a crossview based graph random walk approach. In ACM CIKM, pages 805–810, 2013.
 [Wang et al., 2014] Yang Wang, Xuemin Lin, Qing Zhang, and Lin Wu. Shifting hypergraphs by probabilistic voting. In PAKDD, pages 234–246, 2014.
 [Wang et al., 2015a] Yang Wang, Xuemin Lin, Lin Wu, and Wenjie Zhang. Effective multiquery expansions: Robust landmark retrieval. In ACM Multimedia, pages 79–88, 2015.
 [Wang et al., 2015b] Yang Wang, Xuemin Lin, Lin Wu, Wenjie Zhang, and Qing Zhang. Lbmch: Learning bridging mapping for crossmodal hashing. In ACM SIGIR, 2015.
 [Wang et al., 2015c] Yang Wang, Xuemin Lin, Lin Wu, Wenjie Zhang, Qing Zhang, and Xiaodi Huang. Robust subspace clustering for multiview data by exploiting correlation consensus. IEEE Trans. Image Processing, 24(11):3939–3949, 2015.
 [Wang et al., 2016] Yang Wang, Wenjie Zhang, Lin Wu, Xuemin Lin, Meng Fang, and Shirui Pan. Iterative views agreement: An iterative lowrank based structured optimization method to multiview spectral clustering. In IJCAI, 2016.
 [Wang et al., 2017a] Yang Wang, Xuemin Lin, Lin Wu, and Wenjie Zhang. Effective multiquery expansions: Collaborative deep networks for robust landmark retrieval. IEEE Trans. Image Processing, 26(3):1393–1404, 2017.
 [Wang et al., 2017b] Yang Wang, Wenjie Zhang, Lin Wu, Xuemin Lin, and Xiang Zhao. Unsupervised metric fusion over multiview data by graph random walkbased crossview diffusion. IEEE Trans. Neural Networks and Learning Systems, 28(1):57–70, 2017.
 [Wu and Wang, 2017] Lin Wu and Yang Wang. Robust hashing for multiview data: Jointly learning lowrank kernelized similarity consensus and hash functions. Image Vision Comput, 57:58–66, 2017.
 [Wu et al., 2008] Xindong Wu, Vipin Kumar, J. Ross Quinlan, Joydeep Ghosh, Qiang Yang, Hiroshi Motoda, Geoffrey J. McLachlan, Bing Liu, Philip S. Yu, ZhiHua Zhou, Michael Steinbach, David J. Hand, and Dan Steinberg. Top 10 algorithms in data mining. Knowledge and Information Systems., 14:1–37, 2008.
 [Wu et al., 2013] Lin Wu, Yang Wang, and John Shepherd. Efficient image and tag coranking: a bregman divergence optimization method. In ACM Multimedia, 2013.
 [Xia et al., 2014] Rongkai Xia, Yan Pan, Lei Du, and Jian Yin. Robust multiview spectral clustering via lowrank and sparse decomposition. In AAAI, 2014.
 [Xu et al., 2015] Chang Xu, Dacheng Tao, and Chao Xu. Multiview intact space learning. IEEE Trans. Pattern Anal. Mach. Intell., 2015.