Beyond Low-Rank Representations: Orthogonal Clustering Basis Reconstruction with Optimized Graph Structure for Multi-view Spectral Clustering

# Beyond Low-Rank Representations: Orthogonal Clustering Basis Reconstruction with Optimized Graph Structure for Multi-view Spectral Clustering

Yang Wang, Lin Wu
The University of New South Wales, Kensington, Sydney, Australia
The University of Queensland, Australia
wangy@cse.unsw.edu.au; lin.wu@uq.edu.au
###### Abstract

Low-Rank Representation (LRR) is arguably one of the most powerful paradigms for Multi-view spectral clustering, which elegantly encodes the multi-view local graph/manifold structures into an intrinsic low-rank self-expressive data similarity embedded in high-dimensional space, to yield a better graph partition than their single-view counterparts. In this paper we revisit it with a fundamentally different perspective by discovering LRR as essentially a latent clustered orthogonal projection based representation winged with an optimized local graph structure for spectral clustering; each column of the representation is fundamentally a cluster basis orthogonal to others to indicate its members, which intuitively projects the view-specific feature representation to be the one spanned by all orthogonal basis to characterize the cluster structures. Upon this finding, we propose our technique with the followings: (1) We decompose LRR into latent clustered orthogonal representation via low-rank matrix factorization, to encode the more flexible cluster structures than LRR over primal data objects; (2) We convert the problem of LRR into that of simultaneously learning orthogonal clustered representation and optimized local graph structure for each view; (3) The learned orthogonal clustered representations and local graph structures enjoy the same magnitude for multi-view, so that the ideal multi-view consensus can be readily achieved. The experiments over multi-view datasets validate its superiority, especially over recent state-of-the-art LRR models.

Beyond Low-Rank Representations: Orthogonal Clustering Basis Reconstruction with Optimized Graph Structure for Multi-view Spectral Clustering

Yang Wang, Lin Wu The University of New South Wales, Kensington, Sydney, Australia The University of Queensland, Australia wangy@cse.unsw.edu.au; lin.wu@uq.edu.au

## 1 Introduction

Spectral clustering [?], which partitions the data objects via their local graph/manifold structure relying on the Laplacian eigenvalue-eigenvector decomposition, is one fundamental clustering problem. Unlike K-Means clustering [?], the data objects within the same group characterize not only the large data similarity but also the similar local graph/manifold structure. With the rapid development of information technology, the data are largely available with the multi-view feature representations (e.g., images can be featured by a color histogram view or a texture view), which naturally paves the way to multi-view spectral clustering. As extensively claimed by the multi-view research [????????], the information encoded by multi-view features describe different properties; thus leveraging the multi-view information can outperform the single-view counterparts. One critical issue on a successful multi-view incorporation implied by the existing work [?????], lies in how to achieve the multi-view consensus/agreement.

Following such principle, a lot of multi-view clustering methods [??] claim that similar data objects should be within the same group across all views. Based on that, the consensus multi-view local manifold structure is further explored with great efforts [????] for multi-view spectral clustering. Among all these methods, Low-Rank Representation (LRR) [?] coupled with sparse decomposition based model has been emerged as a substantially elegant solution, due to its strength of exploring their intrinsic low-dimensional manifold structure encoded by the data correlations embedded in high-dimensional space, while exhibiting strong robustness to feature noise corruptions addressed by sparse noise modeling, hence attracting great attention.

### 1.1 Motivation: LRR Revisited for Multi-View Spectral Clustering

Specifically, the typical LRR model for multi-view spectral clustering stems from the formulation below:

 minZ,Ei||Z||∗+λ∑i∈V||Ei||1 (1) \textmds.t.  Xi=XiZ+Ei,  i∈V,  Z⪰0,

where is the data representation for the view with as its feature dimension, as the number of data objects identical for each view, is the balance parameter, and is the view set. is the self-expressive low-rank similarity representation shared by all views, constrained with based on , which can also be substituted by the other specific dictionaries; is modeled to address the noise-corruption for the view-specific feature representation. ensures the nonnegativity for all its entries. Based on such optimized low-rank , the spectral clustering is finally conducted. One significant limitation of Eq.(1) pointed out by [?] is that, only one common is learned to preserve the flexible local manifold structures for all views, hence fails to achieve the ideal spectral clustering result. To this end, various low-rank are learned to preserve the view-specific local manifold structures, meanwhile minimize their divergence via an iterative-views-agreement strategy for multi-view consensus, followed by a final spectral clustering stage. Despite its encouraging performance, the following standout limitations are inattentively overlooked for LRR model: (1) The low-rank data similarity may not well encode the flexible latent cluster structures over primal view-specific feature space; worse still for the non-ideal local graph construction over such representation for spectral clustering; (2) The low-rank data similarities coming from multi-views may not be within the same magnitude, so that the divergence minimization may not achieve the ideal multi-view clustering consensus.
Our new perspective. The above facts motivate us to revisit the low-rank representation to help reconstruct below for the view

 minZi∈S||Xi−XiZi||2F, (2)

where denotes the set of with low-rankness e.g., cluster number far less than ; Instead of narrowing Low-Rank as self-expressive data similarity from the conventional viewpoint, it is essentially seen as a special case of a generalized Low-Rank projection, to map feature representation to a low-dimensional space to reconstruct with minimum error. As discussed, the self-expressive similarity projection equipped with LRR models still suffer from the aforementioned non-trivial limitations.

Here we ask a question: Is there a superior low-rank projection to minimize Eq.(2), meanwhile address the limitations over the existing LRR models. Our answer to this question is positive. Specifically, we propose to consider as a latent clustered orthogonal projection, via , where

1. Clustered orthogonal projection: , where each column indicates one cluster to characterize its belonging data objects. Compared with LRR over original feature space, the latent factor can better preserve the flexible latent cluster structure.

2. Feature reconstruction with cluster basis: Instead of low-rank data similarity, essentially serves as a mapping to reconstruct the view-specific features via the column of to encode the latent cluster structures.

3. Rethinking : We revisit the intuition of via throughout two stages, remind that where

• is performed to obtain the new projection value for all features over orthogonal columns of ;

• is subsequently the projected representation for all features spanned by clustered orthogonal column basis of .

4. Same magnitude for multi-view consensus: All enjoy the same magnitude due to their orthonormal columns. Hence, the feasible divergence minimization will facilitate the multi-view consensus.

Before shedding light on our technique, we review the typical related work for multi-view spectral clustering

### 1.2 Prior Arts

The prior arts can be classified as per the strategy at which the multi-view fusion takes place for spectral clustering.

The most straightforward method goes to the Early fusion [?] by concatenating the multi-view feature vectors with equal or varied weights into an unified one, followed by the spectral clustering over such unified space. However, such method ignores the statistical property belonging to an individual view. Late fusion [?] may address the limitation to some extents by aggregating the spectral clustering result from each individual view, which follows the assumption that all views are independent to each other. Such assumption is not effective for multi-view spectral clustering as they assume the views to be dependent so that the multi-view consensus information can be exploited for promising performance.

Canonical Correlation Analysis (CCA) is applied for multi-view spectral clustering [?] by learning a common low-dimensional representations for all views, upon which the spectral clustering is performed. One salient drawback lies in the failure of preserving the flexible local manifold structures for different views via such common subspace. Co-training based model [?] learned the Laplacian eigenmap for each view over its projected data representation throughout the laplacian eigenmaps from other views, such process repeated till the convergence, the final similarity are then aggregated for spectral clustering. A similar method [?] is also proposed to coordinate multi-view laplacian eigenmaps consensus for spectral clustering. Despite their effectiveness, they have to follow the scenario of noise free for the feature representations. Unfortunately, it cannot be met in practice. The Low-Rank Representation and sparse decomposition models [??] well tackle the problem, meanwhile exhibits the robustness to feature noise corruptions. However, they still suffer from the aforementioned limitations. To this end, we make the following orthogonal contributions to typical LRR model for multi-view spectral clustering.

### 1.3 Our Contributions

• We revisit the classical Low-Rank Representation (LRR) for multi-view spectral clustering with a fundamentally novel viewpoint of finding it as essentially the latent clustered orthogonal projection based representation with optimized graph structure, to better encode the flexible latent cluster structures than LRR over primal data objects.

• We convert the problem of learning LRR into that of simultaneously learning the clustered orthogonal representation and its optimized local graph structure for each view, rather than directly rely on the local graph construction over original data objects.

• The learned multi-view latent clustered representations and local graph structures enjoy the same magnitude, so as to facilitate a feasible divergence minimization to achieve superior multi-view consensus for spectral clustering.

Extensive experiments over multi-view datasets validate the superiority of our method.

## 2 Learning Clustered Orthogonal projection with Optimized Graph Structure

In this section, we formally discuss our technique. Some notations that are used throughout the paper are shown below.

### 2.1 Notations

For Matrix , the trace of is denoted as ; (or for vector space) denotes the Frobenius norm; is the norm, and denotes the transpose of , and its unclear norm as (sum of all singular values); and as the row and column of . means all entries of are nonnegative. is the identity matrix with adaptive size. 1 indicates the vector of adaptive length with all entries to be 1. indicates the cardinality of the set.

### 2.2 Problem Formulation

As previously defined in section  1.1, is the data representation for the view. is low-rank data similarity representation for . The Eq.(2) is equivalent to computing such that , where has orthonormal columns with its column representing the relevance each data object belongs to the cluster, and indicates latent cluster number. We then arrive at the following

 minUi||Xi−XiUiUTi||2F (3)

As discussed in section 1.1, reveals the new projection representation for all features spanned by the orthogonal basis of to reconstruct .

Optimizing Eq.(3) w.r.t. is equivalent to computing an SVD of to constitute the orthogonal columns of using the principle eigenvectors. Inspired by this, we exploit the latent cluster structures of to form non-overlapping clusters with each characterized by one orthogonal column basis of . Thanks to [?] on low-rank matrix factorization, it yields the following

 ||Zi||∗=minUi,Vi,Zi=UiVTi12(||Ui||2F+||Vi||2F), (4)

where and are latent factors from . Based on that, we approximate via the clustered orthogonal projection factorization , and convert the problem of minimizing to that of learning clustered projection representation below

 ||Zi||∗=minUi,Zi=UiUTi||Ui||2F (5)
###### Remark 1

Unlike the data similarity over raw data objects, via the low-rank matrix factorization can achieve the flexible latent cluster structures. Another crucial issue left to be addressed lies in its local manifold/graph structure modeling over , which is crucial for spectral clustering. One may directly refer to the local graph construction over . However, as previously stated, it cannot effectively encode the the local graph structure over .

Towards this end, we propose to learn an optimized local graph structure over by solving the following

 12min∀j∑nkWi(j,k)=1,Wi⪰0n∑j,k||Ui(j,⋅)−Ui(k,⋅)||22Wi(j,k) (6) =Tr(UTiLiUi),

where is Laplacian matrix, is the diagonal matrix with its diagonal entry equaled to the sum of the row of . The ideal reveals the probability of and data points within the same cluster according to cluster projection representation . We impose the constraint that , and to meet the probability nature of . Following [?], we will impose the regularization to avoid that only the nearest neighbor of each data point is assigned 1 with others 0.

With all the above collected, we finally formulate the problem below

 minUi,Ei,Wi(i∈V)∑i∈V(||Ei||1sparse noise modeling+λ1||Wi||2Fregularized graph structure (7) +λ22Tr(UTiLiUi)%structuring$Ui$withoptimizedlocalmanifoldstructure +β2∑j∈V,j≠i||Ui−Uj||2Fmodeling the multi-view consensus over the Ui within the % same magnitude) \textmds.t.    i=1,…,V,   Xi=DiUTi+Ei,Di=XiUi UTiUi=I,Gi=Ui,Gi≥0,∀jn∑k=1Wi(j,k)=1;Wi⪰0,

where is omitted due to constraint ; all the share the same cluster number for multi-view clustering consensus. , and are non-negative weights related to learning the clustered orthogonal representation, its local graph structure and multi-view consensus modeling, and will be studied in Section 4. The constraint ensures the orthonormal columns of .

###### Remark 2

We introduce two auxiliary variables and . As will be shown later, the intuition of introducing lies in minimizing w.r.t. , where

• it is similar as dictionary learning, while popping up as the corresponding sparse representation learning; moreover, it also enjoys the optimization of the isolated after merging the other into .

## 3 Optimization

Solving Eq.(7) is equivalent to be a unified process of simultaneously learning and for the view. As will be shown later, learning either of them will promote the other. Optimizing Eq.(7) is not jointly convex to , and , we hence alternately optimize each of them with the others fixed. Following [?], we deploy the Augmented Lagrange Multiplier (ALM) together with Alternating Direction Minimization (ADM) strategy, which is widely known as an effective and efficient solver. As the optimization process for the above variables within each view is similar, we only present the optimization process for the view, the same process holds for other views. The augmented lagrangian function can be written below

 min∀j∑nk=1Wi(j,k)=1,0≤Wi(j,k)≤1,UTiUi=IL(Ui,Ei,Di,Gi,Wi) (8) =||Ei||1+λ22Tr(UTiLiUi)+λ1||Wi||2F +β2∑i∈V,i≠j||Ui−Uj||2F+Φ(Ki1,Xi−DiUTi−Ei) +Φ(Ki2,Ui−Gi)+Φ(Ki3,Di−XiUi) +Φ(μ,||Xi−DiUi−Ei||2F+||Ui−Gi||2F+||Di−XiUi||2F),

where , , and are Lagrange multipliers. indicates element-wise multiplication. is a penalty parameter.
Solving : We calculate the partial derivative of Eq.(8) w.r.t. , to be , while fixing others to be constant. After rearranging the terms, it has

 Ui=(λ2Li+(μ+β(|V|−1))I−μXTiXi)−1With O(n3) computational complexityS, (9)

where

 S=∑j∈V,j≠iUj+((Ki1)T−μUiDTi−μETi)Di +XTiKi3+μXTiXiUi

Efficient Row updating strategy of . As shown in Eq.(9), the bottleneck of updating lies in the high computational complexity of caused by the matrix inverse operation against the . To resolve it, we propose to update each row of . Without loss of generality, we set the derivative w.r.t. to be . It then yields the following

 Ui(l,⋅)=(Tli+β∑j≠i,j∈VUj(l,⋅)Influences from other views)(Ri+DTiDi)−1,computational complexity O(c3) (10)

where

 Ri=((1+μ+n∑k=1(λ2Li(k,l)−μ(XTiXi)(k,l)))I∈Rc×c (11)
 Tli=XTi(l,⋅)Ki3+μ(Gi(l,⋅)−ETi(l,⋅)Di) −Ki2(l,⋅)−(Ki1)T(l,⋅)Di

Orthonormalize : After obtaining the whole by updating all rows for each iteration, the clustering algorithm e.g., fast k-means is performed, which yields the cluster indicator for each data point/each row, leading to orthogonal columns then normalize each entry of via the rules as: if is assigned with the cluster , it is 0 otherwise. According to the processing above, it successfully achieves the orthonormal columns of ().

###### Remark 3

As per the row-update strategy for in Eq.(10), we remark the followings:

1. We dramatically reduces the computational complexity from by Eq.(9) to , due to .

2. Another note goes to the process of multi-view consensus of via the row update. Specifically, during each iteration, the is updated via the influence from other views, while served as a constraint to guide the updating, among all of which the divergence is decreased towards a consensus, which is based on the same magnitude among with orthonormal columns.

Solving : We get the partial derivative of Eq.(8) w.r.t. , then yields the following closed form:

 Di=(Ki1Ui−Ki3+μ(2Xi−Ei)Ui)(I+UTiUi)−1μ (12)

The major computational burden lies in , resulting into , which is identical to that for row-updating of , hence efficient.
Solving : Optimizing Eq.(8) w.r.t. is equivalent to solving the following

 minEi||Ei||1+μ2||Ei−(Xi−DiUTi+1μKi1)||2F. (13)

According to [?], the following closed form can be obtained

 Ei=S1μ(Xi−DiUTi+1μKi1), (14)

where , if is positive, it is 0 otherwise.
Solving : Optimizing Eq.(8) w.r.t. is equivalent to the following

 minGiΦ(Ki2,Ui−Gi)+μ2||Gi−Ui||2F (15)

Based on that, we enjoy the following closed form

 Gi=Ui+Ki2μ (16)

Solving : The problem of optimizing can be converted to the following

 minWi∑j,k(λ2||Ui(j,⋅)−Ui(k,⋅)||22Wi(j,k)+λ1Wi(j,k)2) (17) \textmds.t.  ∀jn∑k=1Wi(j,k)=1,0≤Wi(j,k)≤1

As the similarity vector for each sample is independent, we only study the sample.

 min∑k(λ2||Ui(j,⋅)−Ui(k,⋅)||22Wi(j,k)+λ1Wi(j,k)2) (18) \textmds.t.  n∑k=1Wi(j,k)=1,0≤Wi(j,k)≤1

We convert Eq.(18) to the following

 min∑nk=1Wi(j,k)=1,0≤Wi(j,k)≤1||Wi(j,⋅)+mji||22, (19)

where is a vector, with its entry , leading to the following closed form:

 Wi(j,⋅)=(1+∑sl=1mji(l)s\textmd1−mji)+, (20)

where turns the negative entries in to 0 while with positive entries remained. denotes the number of data points that have nonzero weight connected to the sample. We empirically set for all views. Once the is obtained, we may update that to be a balanced undirected graph as .
Consensus : As is solely determined by according to Eq.(20), the consensus on in Remark 3 naturally leads to the consensus over .
Multiplier updating: The lagrange multipliers , and are automatically updated as

 Ki1=Ki1+μ(Xi−DiUi−Ei) (21) Ki2=Ki2+μ(Ui−Gi) Ki3=Ki3+μ(Di−XiUi)

Besides, is tuned via the adaptive updating rule according to [?].
Algorithm convergence: It is worth nothing that ADM strategy converges to a stationary point yet no guaranteed to be global optimum. Upon that, we define the convergence when with or maximum iteration number is reached, which is set to be 25 for our method. The optimization process is conducted regarding each variable alternatively within each view, the entire process is terminated until the convergence rule is met for all views.
Multi-view clustering output
: After the above updating rule is converged, we got the final multi-view clustered representation ; and multi-view optimized local graph structure . The normalized graph cut is applied to generate the clusters as the multi-view spectral clustering output.

We summarize the whole updating process in Algorithm 1.

## 4 Experimental Validation

The following multi-view data sets and their view-specific features are selected according to [??].

• UCI handwritten Digit set: It consists of features of hand-written digits (0-9). The dataset is described by 6 features and contains 2000 samples with 200 in each category. Analogous to [?], we choose 76 Fourier coefficients (FC) of the character shapes and the 216 profile correlations (PC) as two views.

• Animal with Attribute (AwA): It consists of 50 kinds of animals described by 6 features (views): Color histogram ( CQ, 2688-dim), local self-similarity (LSS, 2000-dim), pyramid HOG (PHOG, 252-dim), SIFT (2000-dim), Color SIFT (RGSIFT, 2000-dim), and SURF (2000-dim). We randomly sample 80 images for each category and get 4000 images in total.

• NUS-WIDE-Object (NUS) [?]: The data set consists of 30000 images from 31 categories. We construct 5 views: 65-dimensional color histogram (CH), 226-dimensional color moments (CM), 145-dimensional color correlation (CORR), 74-dimensional edge estimation (EDH), and 129-dimensional wavelet texture (WT).

The following typical multi-view baselines are compared for spectral clustering, covering Early fusion, Late fusion, CCA, Co-training strategy and LRR models as reviewed in Section  1.2. All the parameters are tuned to their best performance.

• MFMSC: Concatenating multi-view features to perform spectral clustering.

• Multi-view affinity aggregation for multi-view spectral clustering (MAASC) [?].

• Canonical Correlation Analysis (CCA) based multi-view spectral clustering (CCAMSC) [?] by learning a common subspace for multi-view data, then perform spectral clustering.

• Co-training [?]: Learning multi-view Laplacian eigenspace via a co-training fashion over each individual one.

• Robust Low-Rank Representation Method (RLRR) [?], as formulated in Eq.(1).

• Low-Rank Representation with Multi-Graph Learning (LRRGL) [?].

Clustering accuracy (ACC) and normalized mutual information (NMI). Pleaser refer to [??] for their detailed descriptions. To demonstrate the robustness superiority over non-LRR methods, following [?], we set the feature corruption noise for each view is with sparse noise as 20% entries with uniformly noise over [-5,5] for RLRR, LRRGL and our method, with in Eq.(7) for our method. All experiments are repeated 10 times, the average clustering results are shown in Tables 2 and 3, where our method outperforms the others, especially better than RLRR and LRRGL, due to its strengthes of

• encoding more flexible latent cluster structures, along with the more ideal optimized local graph structure based on such latent clustered representation.

• The superior multi-view consensus in terms of both latent clustered representation and optimized local graph structure for all views.

To penetrate the first finding, we illustrate the visualized consensus multi-view affinity matrix over NUS data set between our method and LRRGL in Fig. 1, which validates the advantages of our clustered orthogonal representation over low-rank similarity yielded by LRRGL.
Parameter Study: We further study the parameter (clustered orthogonal representations and optimized local graph structure) and (multi-view consensus term) in Eq.(7), and against the clustering accuracy over AwA and NUS data sets; we varied one parameter while fixed the others, and the results are illustrated in Fig. 2, where increasing either of them can improve the clustering accuracy until meet the optimal pair-wise values, followed by a slight performance decreasing. To balance Figs.  2(a) and (b), we finalize and in Eq.(7).

## 5 Conclusion

In this paper, we revisit the classical Low-Rank Representation (LRR) for multi-view spectral clustering, by viewing LRR as essentially a latent clustered orthogonal projection winged with its optimized local graph structure. Following this, we propose to simultaneously learn clustered orthogonal projection and optimized local graph structure for each view, while enjoy the same magnitude over them both for all views, leading to a superior multi-view spectral clustering consensus. Extensive experiments validate its strength.

## References

• [Cai and Chen, 2015] Deng Cai and Xinlei Chen. Large scale spectral clustering via landmark-based sparse representation. 45(8):1669–1680, 2015.
• [Chaudhuri et al., 2009] K. Chaudhuri, S. Kakade, K. Livescu, and K. Sridharan. Multi-view clustering via canonical correlation analysis. In ICML, 2009.
• [Chen et al., 2011] Wen-Yen Chen, Yangqiu Song, Hongjie Bai, Chih-Jen Lin, and Edward Y. Chang. Parallel spectral clustering in distributed systems. IEEE Trans. Pattern Anal. Mach. Intell., 33(3):568–586, 2011.
• [Chua et al., 2009] Tat-Seng Chua, Jinhui Tang, Richang Hong, Haojie Li, Zhiping Luo, and Yan-Tao Zheng. Nus-wide: A real-world web image database from national university of singapore. In ACM CIVR, 2009.
• [Deng et al., 2015] Cheng Deng, Zongting Lv, Wei Liu, Junzhou Huang, Dacheng Tao, and Xinbo Gao. Multi-view matrix decomposition:a new scheme for exploring discriminative information. In IJCAI, 2015.
• [Gao et al., 2013] Jing Gao, Jiawei Han, Jialu Liu, and Chi Wang. Multi-view clustering via joint nonnegative matrix factorization. In SDM, pages 252–260, 2013.
• [Gao et al., 2015] Hongchang Gao, Feiping Nie, Xuelong Li, and Heng Huang. Multi-view subspace clustering. In ICCV, pages 4238–4246, 2015.
• [Greene and Cunningham, 2009] D. Greene and P. Cunningham. A matrix factorization approach for integrating multiple data views. In ECMLPKDD, 2009.
• [Gui et al., 2014] Jie Gui, Dacheng Tao, Zhenan Sun, Yong Luo, Xinge You, and Yuan Yan Tang. Group sparse multiview patch alignment framework with view consistency for image classification. IEEE Transactions on Image Processing, 23(7):3126–3137, 2014.
• [Huang et al., 2010] Yuchi Huang, Qingshan Liu, Shaoting Zhang, and Dimitris N. Metaxas. Image retrieval via probabilistic hypergraph ranking. In CVPR, 2010.
• [Huang et al., 2012] Hsin-Chien Huang, Yung-Yu Chuang, and Chu-Song Chen. Affinity aggregation for spectral clustering. In CVPR, 2012.
• [Kumar and Daume, 2011] Abhishek Kumar and Hal Daume. A co-training approach for multi-view spectral clustering. In ICML, 2011.
• [Kumar et al., 2011] Abhishek Kumar, Piyush Rai, and Hal Daume. Co-regularized multi-view spectral clustering. In NIPS, 2011.
• [Lin et al., 2011] Zhouchen Lin, Risheng Liu, and Zhixun Su. Linearized alternating direction method with adaptive penalty for low-rank representation. In NIPS, 2011.
• [Liu et al., 2010] Guangcan Liu, Zhouchen Lin, and Yong Yu. Robust subspace segmentation by low-rank representation. In ICML, 2010.
• [Ng et al., 2001] Andrew Y. Ng, Michael I. Jordan, and Yair Weiss. On spectral clustering: Analysis and an algorithm. In NIPS, 2001.
• [Nie et al., 2014] Feiping Nie, Xiaoqian Wang, and Heng Huang. Clustering and projected clustering with adaptive neighbors. In KDD, pages 977–986, 2014.
• [Recht et al., 2008] Benjamin Recht, Maryam Fazel, and Pablo A. Parrilo. Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Journal on Optimization, 20(4):1956–1982, 2008.
• [Recht et al., 2010] Benjamin Recht, Maryam Fazel, and Pablo A. Parrilo. Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM review, 45(8):1669–1680, 2010.
• [Wang et al., 2013] Yang Wang, Xuemin Lin, and Qing Zhang. Towards metric fusion on multi-view data: a cross-view based graph random walk approach. In ACM CIKM, pages 805–810, 2013.
• [Wang et al., 2014] Yang Wang, Xuemin Lin, Qing Zhang, and Lin Wu. Shifting hypergraphs by probabilistic voting. In PAKDD, pages 234–246, 2014.
• [Wang et al., 2015a] Yang Wang, Xuemin Lin, Lin Wu, and Wenjie Zhang. Effective multi-query expansions: Robust landmark retrieval. In ACM Multimedia, pages 79–88, 2015.
• [Wang et al., 2015b] Yang Wang, Xuemin Lin, Lin Wu, Wenjie Zhang, and Qing Zhang. Lbmch: Learning bridging mapping for cross-modal hashing. In ACM SIGIR, 2015.
• [Wang et al., 2015c] Yang Wang, Xuemin Lin, Lin Wu, Wenjie Zhang, Qing Zhang, and Xiaodi Huang. Robust subspace clustering for multi-view data by exploiting correlation consensus. IEEE Trans. Image Processing, 24(11):3939–3949, 2015.
• [Wang et al., 2016] Yang Wang, Wenjie Zhang, Lin Wu, Xuemin Lin, Meng Fang, and Shirui Pan. Iterative views agreement: An iterative low-rank based structured optimization method to multi-view spectral clustering. In IJCAI, 2016.
• [Wang et al., 2017a] Yang Wang, Xuemin Lin, Lin Wu, and Wenjie Zhang. Effective multi-query expansions: Collaborative deep networks for robust landmark retrieval. IEEE Trans. Image Processing, 26(3):1393–1404, 2017.
• [Wang et al., 2017b] Yang Wang, Wenjie Zhang, Lin Wu, Xuemin Lin, and Xiang Zhao. Unsupervised metric fusion over multiview data by graph random walk-based cross-view diffusion. IEEE Trans. Neural Networks and Learning Systems, 28(1):57–70, 2017.
• [Wu and Wang, 2017] Lin Wu and Yang Wang. Robust hashing for multi-view data: Jointly learning low-rank kernelized similarity consensus and hash functions. Image Vision Comput, 57:58–66, 2017.
• [Wu et al., 2008] Xindong Wu, Vipin Kumar, J. Ross Quinlan, Joydeep Ghosh, Qiang Yang, Hiroshi Motoda, Geoffrey J. McLachlan, Bing Liu, Philip S. Yu, Zhi-Hua Zhou, Michael Steinbach, David J. Hand, and Dan Steinberg. Top 10 algorithms in data mining. Knowledge and Information Systems., 14:1–37, 2008.
• [Wu et al., 2013] Lin Wu, Yang Wang, and John Shepherd. Efficient image and tag co-ranking: a bregman divergence optimization method. In ACM Multimedia, 2013.
• [Xia et al., 2014] Rongkai Xia, Yan Pan, Lei Du, and Jian Yin. Robust multi-view spectral clustering via low-rank and sparse decomposition. In AAAI, 2014.
• [Xu et al., 2015] Chang Xu, Dacheng Tao, and Chao Xu. Multi-view intact space learning. IEEE Trans. Pattern Anal. Mach. Intell., 2015.
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters

## Supplementary Materials

1404

How to quickly get a good answer:
• Keep your question short and to the point
• Check for grammar or spelling errors.
• Phrase it like a question
Test
Test description