Generative Partial Multi-View Clustering

Generative Partial Multi-View Clustering


Nowadays, with the rapid development of data collection sources and feature extraction methods, multi-view data are getting easy to obtain and have received increasing research attention in recent years, among which, multi-view clustering (MVC) forms a mainstream research direction and is widely used in data analysis. However, existing MVC methods mainly assume that each sample appears in all the views, without considering the incomplete view case due to data corruption, sensor failure, equipment malfunction, etc. In this study, we design and build a generative partial multi-view clustering model, named as GP-MVC, to address the incomplete multi-view problem by explicitly generating the data of missing views. The main idea of GP-MVC lies at two-fold. First, multi-view encoder networks are trained to learn common low-dimensional representations, followed by a clustering layer to capture the consistent cluster structure across multiple views. Second, view-specific generative adversarial networks are developed to generate the missing data of one view conditioning on the shared representation given by other views. These two steps could be promoted mutually, where learning common representations facilitates data imputation and the generated data could further explores the view consistency. Moreover, an weighted adaptive fusion scheme is implemented to exploit the complementary information among different views. Experimental results on four benchmark datasets are provided to show the effectiveness of the proposed GP-MVC over the state-of-the-art methods.

Partial multi-view clustering, Auto-encoders, Generative adversarial networks

I Introduction

Multi-view data could be samples collected from multiple sources, modalities captured by various sensors, or features extracted with different methods. Owing to the advance of hardware technology, multi-view data are quite common in real world [37]. For example, an image in social network could be represented by either its visual cues or the users’ comments on it. Compared with single-view data, multiple views usually boost the model performance [9, 41, 5] by providing the complementary information to represent the same data.

In recent years, increasing research efforts have been made in multi-view learning, where multi-view clustering (MVC) [35, 3, 25, 26, 40] forms a mainstream task that aims to explore the underlying data cluster structure shared by multiple views. MVC methods work as an effective data analysis tool for unlabeled multi-view datasets, and could significantly improve the clustering performance by fusing the information from different views. However, although traditional MVC methods achieve promising progress, their effectiveness depends on the completeness assumption for all the views of each instance. Hence, their performance may degrade when some views include missing data, which raise a challenging case as partial multi-view data [48] or incomplete view data [24, 39].

In practice, incomplete-view data are quite ubiquitous due to the environment issues, obstacle, noise, and malfunction of the collection/transmission/storage equipments [48]. For example, in medical data, some patients may not finish a complete examination for time conflict or other reason; in social multimedia, some instances may not contain visual or audio view as a result of the sensor breakdown. However, traditional MVC methods cannot handle incomplete challenges directly because they aim to find a shared structure among all views and require completeness of each data. In light of this, partial multi-view clustering (PMVC) algorithms have been developed [48, 24, 39, 28, 42, 16].

Fig. 1: The framework of our model. It consists of multi-view encoders , a weighted adaptive fusion layer, a deep embedding clustering layer, multi-view generators , and multi-view discriminators .

In the pioneering works, PMVC methods simply use zero or mean value to fill up the incomplete views. However, these simply imputed data are quite different from the real ones, which avoid MVC to learn a consistent clustering structure and badly degrade the final clustering performance. Existing PMVC methods target to establish a shared latent subspace with complete views and then compensate the latent representations for the missing data, which could be divided into two main directions. Specifically, the first kind is kernel based methods [24, 16], where the main idea is to leverage kernel matrices of the complete views for completing the kernel matrix of incomplete view. This kind of method can only be applied in kernel-based multi-view clustering. The second kind of methods are based on the non-negative matrix factorization (NMF) [47, 22]. For sample missing from a certain view, these methods recover the non-negative matrix corresponding to the view with those obtained from the sample of which the view is un-missing. However, these two kinds of methods still have several limitations. (1) They need to process all the data together and it is inefficient to employ them for the large-scale databases. (2) They require numerous inverse operations for matrix factorization, resulting in a high time complexity. (3) They mainly exploit some regularization and add some constraints on the new representation, yet fail in compensating the missing data in each view explicitly.

Inspired by generative adversarial networks (GAN) [44, 8, 12], it is natural to synthesize the missing data for learning representations of partial views. Vanilla GAN [7] was proposed to generate desired data from random noise. Recently, some GAN  [49, 23] models are designed to learn the relationship between different views. Following this line, we consider to leverage GAN model for compensating the missing data. Nonetheless, directly applying GAN in solving PMVC is not straightforward. First, it is challenging for PMVC to generate the missing data based on partial views, rather than complete views. Second, it is under-explored for existing methods to explicitly consider the clustering task during learning representations from multiple views.

In this paper, we develop a novel generative partial multi-view clustering model, termed as GP-MVC, for the PMVC task. The proposed model is composed of four parts (See Fig. 1): multi-view encoder networks, weighted adaptive fusion layer, clustering layer, and view-specific generative adversarial networks. The proposed model employs multi-view encoder networks to encode the shared latent representation among multiple views. We naturally develop view-specific generative adversarial networks to predict the missing-view data conditioning on the latent representations from the other views. Specifically, we resort to adversarial training to explore consistent information among all the views. The generators of GP-MVC aim to complete the missing data, while the discriminators distinguish fake data from real ones for each view. One clustering layer is designed to boost the clustering structure of the common representation so that it could provide an explicit guidance for representation learning of clustering task. Moreover, we add a weighted adaptive fusion scheme to further exploit the complementary information among different views by introducing a group of learnable weights. By integrating clustering into the generating process, the proposed GP-MVC can adjust generator to compensate the “ideal” missing data and thus improve the clustering performance.

This paper is an extension to our previous work [32]. Compared with [32], several substantial differences have been made as follows: (1) We extend the architecture of GP-MVC from two views to multiple views to make our model generalize well in real-world applications. (2) We develop a new adaptive fusion layer for integrating the complementary information from different views. (3) More theoretical analyses, model discussions and experimental evaluations are provided. We highlight the contribution of this work as the following.

  • A novel GAN based partial multi-view clustering method named as GP-MVC is proposed to capture the shared clustering structure, and to generate missing-view data. Specifically, the proposed GP-MVC learns a consistent subspace shared by multiple views to provide common latent representations for both clustering and data generation tasks.

  • The proposed GP-MVC fully leverages consistent information provided by multi-view data. Particularly, GP-MVC obtains a latent representation with one view, with which it further generates the missing data of the corresponding views. The complementary missing-view data help improve clustering performance.

  • Extensive experiments on several multi-view datasets are conducted. Compared with several state-of-the-art methods, the experimental results proves the superiority of GP-MVC.

The remainder of this paper is organized as follows. In Section II, we conduct a brief review and analysis on related works. Then we introduce the proposed generative partial multi-view clustering in Section III. Experimental setting and evaluation results are reported in Section IV. Finally, we conclude our paper in Section V.

Ii Related Works

As more and more missing multi-view data become common in real-world application. Incomplete multi-view clustering methods have been proposed for multi-view data clustering. In this section, we will introduce some multi-view clustering methods, partial multi-view clustering methods, and generative adversarial networks.

Ii-a Multi-View Clustering

Multi-view clustering methods can be divided into three categories. The first category is spectral-based methods [37, 14, 29, 10]. These methods usually learn a shared similarity matrix among different views and conduct spectral clustering for the final partition result. For example, Kumar et al. [14] designed a co-regularized multi-view spectral clustering, which can perform clustering on different views simultaneously. Motivated by this work, Tsivtsivadze et al. [29] designed neighborhood co-regularized multi-view spectral clustering for microbiome data clustering. The second one is subspace-based method which learns a shared coefficient matrix from each view [38, 33, 17]. Based on this idea, Yin et al. [38] proposed a pairwise sparse multi-view subspace clustering by enforcing the coefficient matrices from each pair of views as similar as possible. Different from the above approaches, the third category [46] mainly uses non-negative matrix factorization to learn a common indicator matrix from different views. Zhao et al. [46] adopted a deep semi-nonnegative matrix factorization to perform multi-view clustering. On account of increasing application of partial multi-view clustering, researchers have proposed partial multi-view clustering.

Ii-B Partial Multi-View Clustering

Piyush et al. [28] designed the first partial multi-view clustering approach. They adopted one views kernel representation as the similarity matrix, and employed Laplacian regularization to complete the kernel matrices of incomplete view. Nonetheless, this approach requires one complete view that consists of all instances. To tackle the problem, an incomplete multi-view clustering was developed based on kernel canonical correlation analysis [24, 16]. These methods optimize the alignment of shared instances in the dataset and thereby can collectively complete the kernel matrices of incomplete view. Despite the effectiveness of these methods, they can only be applied in kernel-based methods. Recently, Non-negative Matrix Factorization (NMF) based Partial View Clustering (PVC) algorithm was proposed in [48]. It establishes a latent subspace in which different view’s examples belonging to one instance are close to each other. It shows to be effective for partial multi-view data. Inspired by its promising performance, numerous NMF based multi-view methods are developed [47, 22]. For example, Rai et al. [22] improve it with graph regularized NMF. Zhao et al. [47] proposed an Incomplete Multi-Modal Visual Data Grouping (IMG), and its main idea is to learn a unified framework by integrating latent subspace generation and compact global structure. However, all these methods utilize a latent space learned for multi-view data with NMF, which restricts its application over negative feature data and nonnegative matrix factorization requires complicated calculation, so they cannot be used for large-scale data. On the other hand, a lot of works using deep model to generate images and achieve successful paradigms.

Ii-C Generative Adversarial Networks

Generative adversarial network (GAN) was developed by Goodfellow et al. [27]. It quickly attracted a huge amount of interest because of its extraordinary performance and interesting theory. Recently, many variations of GAN have been put forward for various goals and applications. For example, Mao et al. proposed Least Squares GAN to solve the vanishing gradients problem when minimizing the objective function [18]. Rather than cross entropy loss function, it adopts least squares loss function for the discriminator. Zhang et al. [43] applied GAN in photorealistic images generation conditioned on text descriptions, and introduced Stacked GAN (StackGAN). In Stack-GAN, two GANs in different stages are adopted to generate high-resolution images. Odena et al. introduced more structures and a specialized cost function to GAN latent space for high-quality sample generation, and then proposed conditional GAN associated with an auxiliary classifier (AC-GAN) [20]. In this way, the conditional information can be class labels or data from other modalities. Isola et al. [11] further explored the application of conditional GANs on paired training data and developed pix2pix GAN, and it can transfer images from one distribution to another effectively. Zhu et al. proposed Cycle GAN [49], which trains unpaired image with a cycle consistent adversarial network by adding cycle consistent loss. It shows a more powerful ability than pix2pix GAN in image translation from one domain to another, and hence effectively solves the paired sample shortage problem. GAN also gains a wide application in multi-view data generation [23] and [45].

Iii Generative Partial Multi-View Clustering

Iii-a Motivation

Existing works which can deal with partial multi-view data are mostly based on kernel and NMF to predict the missing data and then do clustering task at the same time. Despite appealing performance achieved, they still have two limitations. First, non-negative matrix factorization based methods need a lot of computation for inverse operation. As a result, they cannot be applied to large-scale data. Second, all these methods only focus on learning a shared latent space for clustering. They ignore learning a latent space that is suitable for clustering and can be used to generate the missing view data simultaneously. To address these two challenges, we design a novel model called Generative Partial Multi-View Clustering (GP-MVC). We combine the generative capacity of GAN and the clustering capacity of deep embedding clustering to our model. Thus, it can generate the missing-view data and learn a better clustering structure for partial multi-view data at the same time.

Notations. We represent the data with the multi-view data matrix , where , is the number of views, is the number of samples, and is the feature dimension of -th view. Since the setting of our model is partial multi-view clustering, we divide the multi-view data to two parts: one is paired data in which all the view is complete. The other one is unpaired data in which some data is missing. We give an example of multi-view data in Fig. 2. respectively denote the missing data or generated data of each view.

Fig. 2: Illustration for partial multi-view data, the data in solid box is complete paired data, while them in red dash box is partial data.

Iii-B Framework

Network Architecture. Fig. 1 illustrates the architecture of our model for partial multi-view data. It is composed of five sub-networks: encoder network , weighted adaptive fusion layer, deep embedding clustering layer, generator network , and discriminator network . For multi-view data, corresponding to each view, our model has encoders, one fusion layer, one clustering layer, generators and discriminators. We introduce the model in details as follows.

Encoder network : . Each view has a encoder which is stacked fully connected. It encodes the -th original view to a latent representation , where . Denote the nonlinear function of -th encoder by . It maps the -dimensional original data to a -dimensional latent representation , , where is shared parameters of all encoders. In order to capture shared structure of multi-view data, we partially share parameters of encoders for all the view, i.e. .

Generator network : . In our model, the generator can also be seen as decoder as it has symmetrical structure with encoder, i.e., stacked fully connected. Corresponding to each view, there is one exclusive generator which can generate the corresponding view. For example, the -th generator inputs the latent representation and outputs the generated -th view . That is to say, for , it will outputs , no matter which latent representation it inputs. Thus, we hope the latent representation are similar to each other. The best status is they are equal with each other. However, the equal condition is too strict, so we introduce a common representation for each view to relax the equal condition.

Discriminator network : . Similar with generator, corresponding to each view, there is one exclusive discriminator which is composed of 3 stacked fully connected layers. As shown in Fig. 1, all discriminators are connected with generators. Take the -th discriminator as an example, it inputs the real sample and the generated fake sample of the -th view. Then it outputs judgment result of the authenticity for the generated sample : real/fake, i.e., . The result means that discriminator considers the generated sample is real or fake. The discriminator’s result will also feedback to generator , and prompt generator to produce more realistic sample. The process will be repeat until generator can produce so real sample that discriminator cannot distinguish which one is generated sample.

Weighted adaptive fusion layer: After encoder, we got latent spaces . To fully explore the complimentary information across multi-view images, we adaptively fuse the latent representations of different view and learn a common representation . Specifically, we design a learnable fusion layer to obtain by , where denotes the fusion function parameterized by .

Deep embedded clustering layer: This layer will improves the distribution of common representation and obtain clustering result. First, we compute the original distribution of common representation , named it as , then based on , we can compute a target distribution which is more compact and suitable for clustering. According target distribution, we refine encoder network and hope it can learn a latent representation which is similar to the target distribution.

Iii-C Objective Function

Our objective function includes four terms as: auto-encoder loss, adversarial training loss, weighted adaptive fusion loss and KL clustering loss.

Auto-Encoder Loss

The auto-encoder loss works on encoder network and generator network . We hope the output of generator is similar with the input of encoder. Thus, we minimizes the squared F-norm of the reconstruction error between the generated sample and input sample. The auto-encoder loss is


Take the -th view as an example, when it passes through encoder , we get , i.e., the latent representation . In auto-encoder loss, we take generator as decoder corresponding to encoder . Thus, the function of generator is reconstructing input sample from latent representation . denotes the output of generator. We minimize the reconstruction error for obtain encoder and generator networks which can output similar sample with the input sample.

Generally, when all input data are paired, auto-encoder loss alone is sufficient to train the encoder and generator networks. Nonetheless, in partial multi-view learning, the performance of encoder and generator networks will be greatly degraded because of unpaired data. To tackle this problem due to unpaired data, apart from auto-encoder loss, we further employ adversarial training loss to refine the network in the objective function.

Fig. 3: The framework of cycle GAN. We take view and view as an example.

Adversarial Training Loss

Suppose is a sample from data distribution , and is a noise sample from noise distribution . A typical GAN is composed of two sub-networks, namely, a generator and a discriminator . Among them, generates a fake image with a vector of random noise as input, while aims to distinguish the fake image generated by from the real image, and it will return a value ranging from to , which indicates the probability whether the input image is real or fake. GAN adopts the idea of game theory, and and are somehow participating a min-max game. Specifically, the loss functions of and respectively try to minimize and maximize the likelihood that the fake image assigns to the fake source. From this view, we can easily understand the loss function of GAN


Considering that there exist a large amount of unpaired data and lack paired data in our setting, we employ cycle GAN in our model to conduct adversarial learning, which can effectively handle the problem due to insufficiency of paired data. A cycle GAM model trains two GAN models and adds a GAN loss and a cycle consistency loss based on Eq. (2) to tackle unpaired data situation. Fig. 3 shows the framework of a cycle GAN model. Its main idea is each data distribution can generate the other via these two GAN models. According to the theory of cycle GAN, we design a multi-view adversarial training loss for multi-view data in our model


The adversarial training loss mainly works on generator and discriminator networks. Next we will give the definition of GAN loss and cycle consistency loss . Suppose that the data distribution of the -th views is , and let denote the mapping from -th view data distribution to -th view data distribution , i.e., using -th view sample to generate the -th view sample while is paired with . The same theory for , which transform the -th view sample to -th view sample. inputs ’s generated sample and real sample , and then outputs the discriminant result. Thus, for partial multi-view data, the loss of -th GAN network in our model is


The generator aims at generating fake sample which is similar to real sample, and the discriminator tries to distinguish the generated sample from real sample. In this way, generator and discriminator play an opposite game until convergence when generator can generate great real sample. Nonetheless, GAN has a character that it maps a same input to any random permutation of sample in the target data distribution. Therefore, the model cannot obtain a desired output only with the GAN loss. To tackle this problem, cycle GAN further employs cycle-consistent loss to update the learned mapping and thereby reduces the space of possible mapping functions. Specifically, for each sample in -th views, after passing through encoder and generator , we get the generated sample . Then, let the generated sample pass through encoder and generator . Finally, we obtain the generated sample after passing the translation cycle. Since the cycle consistency loss assists the generator in mapping a given sample to a desired output which is special and paired with , we can combine GAN loss and cycle consistency loss to guarantee a desired output effectively. The multi-view cycle consistency loss of our model is minimizing the -norm of the reconstruction error between the final generated sample and input sample


Weighted Adaptive Fusion Loss

Before clustering, we need fuse all the latent space. There we use weighted adaptive fusion We will learn latent subspaces for each view data , where . Then, by the following equation, we get a common representation based on all latent subspaces


where denotes a concatation or summation function.

After getting the latent representation of views from encoder, we adopt an adaptive fusion method to extract public identification information that is beneficial to clustering from the multi-modal primitive space. In a manner similar to classification, the authentication information is used to approximate the ideal data distribution. We define the adaptive fusion loss function as


where is the output of encoder, is a group learnable parameter, and denotes the fusion function.

KL Clustering Loss

The analysis demonstrates that the performance of the generator and discriminator networks can be improved by employing adversarial training loss. Nevertheless, it makes little modification to the encoders. The encoders learn a common representation for the final clustering task based on paired data and unpaired data. However, unpaired data will negatively affect its clustering performance. To optimize encoder and obtain a better clustering structure, we add a clustering loss measured by Kullback-Leibler divergence (KL-divergence) in our model. We represent initial clustering centroids with . In order to measure the similarity between common representation point and centroid , we refer to the method in [36] and employ the Student’s -distribution as a kernel. Then we can calculate the probability that a sample is assigned to cluster with the following formula


It is also called soft assignment. Herein, we denote the degree of freedom of the Student’s -distribution as . Then we first raise to the squared and then normalize it with frequency per cluster to obtain the target distribution .


where represents soft cluster frequency. In this way, our method improves clustering performance and is able to lay special stress on data points assigned with high confidence. Finally, we leverage minimizing the KL-divergence between original data distribution and target distribution as clustering loss


We aim to match the soft assignment to the target distribution . This assists in sharpening the data distribution and also concentrating the same class data. Additionally, we are able to achieve a more common representation that works more effectively in partial multi-view clustering.

Overall objective

By integrating auto-encoder loss, adversarial training loss, weighted adaptive fusion loss and KL clustering loss, we have the following objective function of GP-MVC


where , , and are parameters to adjust the impact of each term in all objective function. Next we will give the details of each term in Eq. (11).

1:A multi-view data matrix ; Parameters , , ;
2:The result of clustering.
3:Initialize: The parameter for all models: encoder , clustering layer, generator , and discriminator .
4:Step 1: Train encoder and generator ;
5:for each pre-specified iterations do
6:     Input paired data ;
7:     Update and by Eq. (1);
8:     Computer the clustering centroids ;
9:end for
10:Step 2: Train generator and discriminator ;
11:for each pre-specified iterations do
12:     Input all data ;
13:     Update , and by Eq. (3);
14:     Generate missing sample ;
15:     Computer the common representation ;
16:end for
17:Step 3: Train all model;
18:for each pre-specified iterations do
19:     Input the clustering centroids , the common representation , and the completed data;
20:     Update , , and by Eq. (11);
21:     Computer the common representation ;
22:end for
23:Clustering on the common representation ;
Algorithm 1 Generative Partial Multi-view Clustering

Iii-D Implementation

Step 1: Training encoder and generator on paired data.

We only use paired data to train encoder and generator of our model at first. Since they can be seen as auto-encoder network, we just use the auto-encoder loss to optimise . Specifically, we take paired data as input for encoder and get latent subspace . Then they pass through generator respectively and output . Finally we compute clustering centroids and use auto-encoder loss to update encoder and generator networks. By step 1, we obtain clustering centroids which will be used in step 3. These clustering centroids learned by paired data can help the generated sample for missing-view sample to be assigned to the right group.

Step 2: Training generator and discriminator on all data.

We use all data to train generator and discriminator networks, i.e., multi-view cycle GAN, in this step. For paired data , we directly take them as input of encoder . When encountering unpaired data of -th view, we randomly choose one sample from -th view () as input of encoder in each epoch to increase the number of unpaired data. After step 2, we save the output of generator , i.e., the generated sample for missing view. Then we compute a new common representation on complete database.

Step 3: Training all model on the completed dataset.

We use the clustering centroids from step 1, and the completed dataset from step 2 as input for our model to train it again. In each epoch, we update the the clustering centroids, the common representation and the generated sample for the missing view.

Algorithm 1 illustrates the training procedure of our model.

Methods 0.1 0.3 0.5 0.7 0.9
best SC 0.47480.0131 0.5169 0.0174 0.56920.0159 0.61390.0121 0.67160.0136
AMGL [19] 0.25240.0349 0.23570.0180 0.25380.0155 0.28070.0125 0.29580.0195
RMSC [34] 0.33950.0050 0.36830.0051 0.39070.0045 0.42330.0048 0.44990.0022
ConSC [14] 0.27810.0411 0.22300.0148 0.21390.0078 0.21060.0058 0.28840.0896
PVC [22] 0.50150.0438 0.54240.0537 0.62770.0402 0.68330.0931 0.75460.1091
IMG [21] 0.43730.0100 0.45080.0254 0.48680.0147 0.50550.0131 0.51760.0415
PVC-GAN [32] 0.52100.0090 0.67110.0107 0.86310.0043 0.9154 0.0107 0.94980.0026
GP-MVC 0.58740.0249 0.78680.0234 0.88790.0128 0.93190.0082 0.96550.0088
TABLE I: The average Clustering Accuracy in terms of different impartial ratios on the BDGP Database.
Methods 0.1 0.3 0.5 0.7 0.9
best SC 0.34830.0080 0.39560.0076 0.44290.0114 0.47740.0103 0.52770.0106
AMGL [19] 0.15580.0155 0.14120.0218 0.15240.0343 0.24150.0631 0.33460.0288
RMSC [34] 0.34920.0077 0.41500.0294 0.45750.0233 0.49600.0174 0.51440.0204
ConSC [14] 0.37040.0275 0.35810.0231 0.36740.0131 0.41370.0396 0.50880.0299
PVC [22] 0.35250.0238 0.38640.0104 0.42380.0446 0.44010.0150 0.46440.0423
IMG [21] 0.46550.0186 0.46400.0213 0.46130.0146 0.45920.0146 0.46220.0151
PVC-GAN [32] 0.45170.0086 0.48360.0071 0.52800.0078 0.52020.0070 0.53400.0073
GP-MVC 0.56460.0247 0.55420.0330 0.57760.0169 0.59630.0088 0.59550.0163
TABLE II: The average Clustering Accuracy in terms of different impartial ratios on the sampled MNIST Database.
Methods 0.1 0.3 0.5 0.7 0.9
best SC 0.48630.0122 0.51880.0112 0.56640.0143 0.61140.0189 0.66130.0178
AMGL [19] 0.60560.0489 0.68280.0564 0.73700.0281 0.75060.0320 0.75940.0211
RMSC [34] 0.46420.0159 0.52930.0096 0.59250.0154 0.65070.0202 0.71540.0375
ConSC [14] 0.50630.0325 0.54380.0272 0.59820.0246 0.69820.0481 0.79160.0299
PVC [22] 0.32380.0087 0.30770.0078 0.34190.0148 0.42360.0168 0.57300.0261
IMG [21] 0.53500.0192 0.54550.0262 0.54570.0193 0.55290.0166 0.56330.0213
PVC-GAN [32] 0.69820.0104 0.83800.0144 0.88060.0081 0.90300.0074 0.92340.0055
GP-MVC 0.76290.0276 0.91410.0040 0.93720.0056 0.94540.0051 0.95080.0026
TABLE III: The average Clustering Accuracy in terms of different impartial ratios on the HW Database.
Methods 0.1 0.3 0.5 0.7 0.9
best SC 0.18630.0051 0.19660.0070 0.20810.0033 0.21770.0052 0.22630.0029
AMGL [19] 0.16770.0083 0.18170.0083 0.18500.0123 0.18080.0076 0.18150.0064
RMSC [34] 0.19250.0051 0.20010.0084 0.21360.0097 0.22390.0050 0.22870.0035
ConSC [14] 0.15730.0050 0.16500.0048 0.18040.0066 0.19600.0058 0.21480.0039
PVC [22] 0.11180.0017 0.12020.0018 0.12900.0037 0.14820.0084 0.17140.0114
IMG [21] 0.11360.0019 0.12150.0036 0.12620.0028 0.13100.0020 0.13530.0017
PVC-GAN [32] 0.17110.0044 0.19880.0063 0.21910.0050 0.22160.0112 0.23130.0082
GP-MVC 0.19150.0068 0.22160.0092 0.23980.0063 0.24750.0046 0.27700.0089
TABLE IV: The average Clustering Accuracy in terms of different impartial ratios on the NUS Database.
Fig. 4: The average clustering NMI and Purity of all the methods in terms of different impartial ratios on the four Databases.

Iv Experimental Analysis

To test the performance of our method, we conduct several experiments on four multi-view databases and compare our method with the state-of-the-art PMVC and MVC methods.

Iv-a Experimental Setting


We evaluate our method on four different datasets. In the following part, we present a brief introduction to these four datasets.

BDGP [2]: a database that consists of both visual view and textual view. It contains images about drosophila embryos from categories, and each image is described by two vectors, i.e., a -D visual vector and a -D textual feature vector. In the experiment, all the data are used to evaluate the performance of the aforementioned methods on both the two features.

MNIST [15]: a handwritten digits image database composed of training examples and testing examples, and each of them has the size of pixels. We use the original black and white image of MNIST and its corresponding edge image for testing. Since it is difficult to conduct comparison on large-scale database, we construct a sampled MNIST database by sampling images from the original database randomly, and then employ the sampled MNIST database to conduct the experiments.

Handwritten numerals (HW) [30]: an image database with images of classes from to digit. Each class contains samples with kinds of features, i.e., 76 Fourier coefficients for two-dimensional shape descriptors(FOU), 216 profile correlations (FAC), 64 Karhunen-Loeve coefficients (KAR), 240 pixel feature (PIX) obtained by dividing the image of 3048 pixels into 240 tiles of 23 pixels and counting the average number of object pixels in each tile, 47 rotational invariant Zernike moment (ZER), and 6 morphological (MOR) features. In our experiment, we choose the first three views of HW database: 76 Fourier coefficients for two-dimensional shape descriptors(FOU), 216 profile correlations (FAC), 64 Karhunen-Loeve coefficients (KAR).

NUS-WIDE (NUS) [4]: database consists of 269,648 images of 81 concepts. In our experiments, we select 12 categories of animal concepts, including cat, cow, dog, elk, hawk, horse, lion, squirrel, tiger, whales, wolf, and zebra. We extract three kinds of low-level features from this database: 144 color correlogram, 128 wavelet texture, 225 block-wise color moment.

Baseline Methods and Evaluate Metrics

We implement some state-of-the-art partial multi-view clustering methods. In details, we compare the proposed GP-MVC with Incomplete Multi-Modal Visual Data Grouping (IMG) [21], Partial Multi-View Clustering using Graph Regularized NMF (PVC) [22, 48], Partial Multi-View Clustering via Consistent GAN (PVC-GAN) [32]. We also compare GP-MVC with several multi-view clustering methods: Feature Concatenation Spectral Clustering (ConSC) [14], Robust Multi-view Spectral Clustering (RMSC) [34], Auto-weighted Multiple Graph Learning (AMGL) [19], and spectral clustering of the best single view (best SC) as well.

For the partial view setting, we test the aforementioned methods under different impartial ratios. The impartial ratio is defined as the proportion of complete samples in all the samples. It varies from to with an interval of . For MVC methods, which cannot handle missing instances, we fill in the missing instances with the average feature vector. We adopt three standard clustering validation metrics, i.e. Accuracy (ACC) [1], Normalized Mutual Information (NMI) [6], and Purity [31], to evaluate the performance of each method.

Specifically, we randomly choose five groups of samples as missing data according to five different impartial ratios () in each database. In the experiment, we repeat this process 10 times. We report the average clustering accuracy and the corresponding standard deviation of all the methods on four databases, and provide the average Accuracy results in Table I to IV respectively. Fig. 4 shows the average clustering NMI and Purity in terms of different impartial ratios on the four databases.

Implementation Details

Our algorithm is implemented with the public toolbox of PyTorch on a desktop with Ubuntu system and NVIDIA Titan V Graphics Processing Units (GPUs) asa well as GB memory. We train our model using Adam [13] optimizer with default parameter setting, and the learning rate is fixed as . For each training step, we conduct epochs and record the experimental results. Besides, we also test all the performance of the other methods by Matlab on the same environment for comparison.

(a) 0.1
(b) 0.5
(c) 0.9
(d) 0.1
(e) 0.5
(f) 0.9
Fig. 5: The images in the first line are the real images from view 1 ((a), (b), (c)) or view 2 ((d), (e), (f)) of MNIST database. The images in the second line (marked by red box) are the fake images which are generated by the latent representation of the images in the first line. The images in the third line are the real images of view 2 or view 1 corresponding to the first line. The images generated under different training data corresponding to different impartial ratios. (a), (d) 0.1 impartial ratio; (b), (e) 0.5 impartial ratio; (c), (f) 0.9 impartial ratio.

Iv-B Partial Multi-view Clustering Performance

The evaluation results are summarized in Table I to Table IV and Fig. 4, which indicate that our methods achieves better clustering performance than the others on all the cases. Here, we present some important observations as below:

From Table I to Table IV, and Fig. 4, we could see that PMVC methods are superior in most databases, especially when the partial ratio is large. It indicates that missing-view data have a negative influence on effectiveness of MVC methods. Therefore, PMVC methods are more effective when confronting missing data problem. From Table II, we could see that when the view is not complete, the multi-view clustering method AMGL has inferior performance than single-view methods. It further illustrates that some of multi-view methods are sensitive to missing data or noises.

The results of Table I to Table IV also demonstrate that our method outperforms methods tested. It is probably because of that GP-MVC is able to learn a consistent clustering structure for each view, with which it effectively generates the missing data. Thereby, it can construct a more effective common subspace with these complementary missing data. Compared the results of PVC-GAN and GP-MVC in Table I and Table II, we can observe GP-MVC perform better than PVC-GAN. The difference of these two only is the fusion way. In GP-MVC, we use weighted adaptive fusion way to fuse the multiple latent space. It prove the effectiveness of the weighted adaptive fusion loss.

From Table III to Table IV, we can see that partial multi-view methods (PVC, IMG, and PVC-GAN) perform worse than others. It is because all these methods can only be conducted on the two-view databases. While HW and NUS have more than two views. Thus, for PVC, IMG, and PVC-GAN, we use the first two views of HW and NUS. For the MVC methods, we still use the average sample to fill up the missing sample. The experimental results illustrate our model can well be applied to multi-view databases.

Loss 0.1 0.3 0.5 0.7 0.9
AE 0.6530 0.7910 0.8760 0.8835 0.8945
AE + AT 0.7165 0.8680 0.8840 0.8985 0.9000
ALL 0.7629 0.9141 0.9372 0.9454 0.9508
TABLE V: The ablation study of GP-MVC under different impartial ratios on the HW dataset. We show the clustering accuracy of our method with different loss function.
Clustering metrics 0.1 0.3 0.5 0.7 0.9
Accuracy 0.50160.0074 0.51440.0070 0.52030.0097 0.53910.0104 0.55510.0120
NMI 0.45670.0018 0.46450.0047 0.46590.0045 0.51430.0029 0.48280.0167
Purity 0.53980.0033 0.55550.0043 0.55670.0073 0.56890.0049 0.57570.0049
TABLE VI: The Clustering performance of our method on the whole MNIST Database.

Iv-C Model Discussion

Ablation Study

To verify how much effect of each term in the objective function of our model, we conduct ablation studies. We perform the following three experiments to isolate the effect of the loss , , and . In the first experiment, we only use auto-encoder loss to train encoder and generator networks. In the second experiment, we use auto-encoder loss and adversarial training loss to train encoder, generator and discriminator networks. In the third experiment, we use all objective function to train our model, i.e., based on the second experiment, we add adaptive fusion loss and KL-clustering loss . In each experiment, we change the partial ratio from to with an interval of , and do clustering task on the common representation learned by encoder network. Then we show the clustering accuracy on Table V. From Table V, we can see that the third experiment using all objective function has the best performance, and the clustering accuracy of using auto-encoder loss and adversarial training loss is superior to that only using auto-encoder loss. It illustrates each term in our objective function has great effect on the final clustering performance of our model. When adding adversarial training loss to auto-encoder loss , the clustering accuracy is improved. It is probably because that generated sample for missing view promote to learn a better clustering structure and improve clustering performance. Weighted adaptive fusion loss and KL-clustering loss boost the performance after adding it in the third experiment. This phenomenon illustrates that a good common representation will in turn helps generate more realistic samples for missing view and improve clustering performance again.

Missing Data Generation

To visually present the generated results of our method, we show the generated images (marked by red box) by our method on the sampled MNIST database under three different impartial ratios (0.1, 0.5, 0.9) on Fig 5. We can see our method can well generate the missing data. With the number of paired data increasing, our method can generator more real data.

Large-Scale Partial Multi-View Clustering

In addition, we conduct all methods on the total MNIST database in terms of different impartial ratios and run 10 times. Due to all the other methods go wrong with the problem of out of memory. We only show the clustering performance of our method in Table VI. All other methods cannot clustering on images as for the memory of computer does not satisfy the need of other algorithm. This illustrate our method can be used on large scale database.

V Conclusions

In this paper, we propose a novel generative partial multi-view clustering approach. It is able to fill up the incomplete views based on the common subspace via GAN model, and learn an excellent clustering structure simultaneously. In addition, it further employs the complementary incomplete view to study a consistent common structure and greatly improves its clustering performance. We validate the clustering performance improvement of the proposed method via a series of comprehensive experiments, and comparison results to several existing methods demonstrates the superiority of GP-MVC.


  1. D. Cai, X. He and J. Han (2005) Document clustering using locality preserving indexing. IEEE TKDE 17 (12), pp. 1624–1637. Cited by: §IV-A2.
  2. X. Cai, H. Wang, H. Huang and C. Ding (2012) Joint stage recognition and anatomical annotation of drosophila gene expression patterns. Bioinformatics 28 (12), pp. i16–i24. Cited by: §IV-A1.
  3. G. Chao, S. Sun and J. Bi (2017) A survey on multi-view clustering. arXiv preprint arXiv:1712.06246. Cited by: §I.
  4. T. Chua, J. Tang, R. Hong, H. Li, Z. Luo and Y. Zheng (2009) NUS-wide: a real-world web image database from national university of singapore. In ACM CIVR, pp. 48. Cited by: §IV-A1.
  5. Zhengming. Ding and Yun. Fu (2018) Robust multiview data analysis through collective low-rank subspace. IEEE TNNLS 29 (5), pp. 1986–1997. Cited by: §I.
  6. P. A. Estévez, M. Tesmer, C. A. Perez and J. M. Zurada (2009) Normalized mutual information feature selection. IEEE TNNLS 20 (2), pp. 189–201. Cited by: §IV-A2.
  7. I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville and Y. Bengio (2014) Generative adversarial nets. In Advances in neural information processing systems, pp. 2672–2680. Cited by: §I.
  8. J. Gui, Z. Sun, Y. Wen, D. Tao and J. Ye (2020) A review on generative adversarial networks: algorithms, theory, and applications. arXiv preprint arXiv:2001.06937. Cited by: §I.
  9. Chengping. Hou, Feiping. Nie, Dongyun. Yi and Dacheng. Tao (2017) Discriminative embedded clustering: a framework for grouping high-dimensional data.. IEEE TNNLS 26 (6), pp. 1287–1299. Cited by: §I.
  10. Z. Huang, J. Zhou, X. Peng, C. Zhang, H. Zhu and J. Lv (2019) Multi-view spectral clustering network. In Proc. 28th Int. Joint Conf. Artif. Intell., pp. 2563–2569. Cited by: §II-A.
  11. P. Isola, J. Zhu, T. Zhou and A. A. Efros (2017) Image-to-image translation with conditional adversarial networks. arXiv preprint. Cited by: §II-C.
  12. T. Karras, S. Laine and T. Aila (2019) A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4401–4410. Cited by: §I.
  13. D. P. Kingma and J. Ba (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §IV-A3.
  14. A. Kumar, P. Rai and H. Daume (2011) Co-regularized multi-view spectral clustering. In Advances in neural information processing systems, pp. 1413–1421. Cited by: §II-A, TABLE I, TABLE II, TABLE III, TABLE IV, §IV-A2.
  15. Y. LeCun (1998) The mnist database of handwritten digits. http://yann. lecun. com/exdb/mnist/. Cited by: §IV-A1.
  16. X. Liu, X. Zhu, M. Li, L. Wang, C. Tang, J. Yin, D. Shen, H. Wang and W. Gao (2018) Late fusion incomplete multi-view clustering. IEEE TPAMI 41 (10), pp. 2410–2423. Cited by: §I, §I, §II-B.
  17. S. Luo, C. Zhang, W. Zhang and X. Cao (2018) Consistent and specific multi-view subspace clustering. In Thirty-Second AAAI Conference on Artificial Intelligence, Cited by: §II-A.
  18. X. Mao, Q. Li, H. Xie, R. Y. Lau, Z. Wang and S. P. Smolley (2017) Least squares generative adversarial networks. In IEEE ICCV, pp. 2813–2821. Cited by: §II-C.
  19. F. Nie, J. Li and X. Li (2016) Parameter-free auto-weighted multiple graph learning: a framework for multiview clustering and semi-supervised classification.. In IJCAI, pp. 1881–1887. Cited by: TABLE I, TABLE II, TABLE III, TABLE IV, §IV-A2.
  20. A. Odena, C. Olah and J. Shlens (2016) Conditional image synthesis with auxiliary classifier gans. arXiv preprint arXiv:1610.09585. Cited by: §II-C.
  21. B. Qian, X. Shen, Y. Gu, Z. Tang and Y. Ding (2016) Double constrained nmf for partial multi-view clustering. In DICTA, pp. 1–7. Cited by: TABLE I, TABLE II, TABLE III, TABLE IV, §IV-A2.
  22. N. Rai, S. Negi, S. Chaudhury and O. Deshmukh (2016) Partial multi-view clustering using graph regularized nmf. In ICPR, pp. 2192–2197. Cited by: §I, §II-B, TABLE I, TABLE II, TABLE III, TABLE IV, §IV-A2.
  23. C. Shang, A. Palmer, J. Sun, K. Chen, J. Lu and J. Bi (2017) VIGAN: missing view imputation with generative adversarial networks. arXiv preprint arXiv:1708.06724. Cited by: §I, §II-C.
  24. W. Shao, X. Shi and S. Y. Philip (2013) Clustering on multiple incomplete datasets via collective kernel learning. In IEEE ICDM, pp. 1181–1186. Cited by: §I, §I, §I, §II-B.
  25. C. Tang, X. Zhu, X. Liu, M. Li, P. Wang, C. Zhang and L. Wang (2018) Learning a joint affinity graph for multiview subspace clustering. IEEE TMM 21 (7), pp. 1724–1736. Cited by: §I.
  26. Z. Tao, H. Liu, S. Li, Z. Ding and Y. Fu (2019) Marginalized multiview ensemble clustering. IEEE TNNLS. Cited by: §I.
  27. L. Tran, X. Liu, J. Zhou and R. Jin (2017) Missing modalities imputation via cascaded residual autoencoder. In IEEE CVPR, pp. 1405–1414. Cited by: §II-C.
  28. A. Trivedi, P. Rai, H. Daumé III and S. L. DuVall (2010) Multiview clustering with incomplete views. In NIPS Workshop, Cited by: §I, §II-B.
  29. E. Tsivtsivadze, H. Borgdorff, J. van de Wijgert, F. Schuren, R. Verhelst and T. Heskes (2013) Neighborhood co-regularized multi-view spectral clustering of microbiome data. In IAPR, pp. 80–90. Cited by: §II-A.
  30. M. Van Breukelen, R. P. Duin, D. M. Tax and J. Den Hartog (1998) Handwritten digit recognition by combined classifiers. Kybernetika 34 (4), pp. 381–386. Cited by: §IV-A1.
  31. R. Varshavsky, M. Linial and D. Horn (2005) Compact: a comparative package for clustering assessment. In ISPA, pp. 159–167. Cited by: §IV-A2.
  32. Q. Wang, Z. Ding, T. Zhiqiang, G. Quanxue and F. Yun (2018) Partial multi-view clustering via consistent gan. In ICDM, pp. 1–6. Cited by: §I, TABLE I, TABLE II, TABLE III, TABLE IV, §IV-A2.
  33. X. Wang, X. Guo, Z. Lei, C. Zhang and S. Z. Li (2017) Exclusivity-consistency regularized multi-view subspace clustering. In IEEE CVPR, pp. 923–931. Cited by: §II-A.
  34. R. Xia, Y. Pan, L. Du and J. Yin (2014) Robust multi-view spectral clustering via low-rank and sparse decomposition.. In AAAI, pp. 2149–2155. Cited by: TABLE I, TABLE II, TABLE III, TABLE IV, §IV-A2.
  35. D. Xie, Q. Gao, Q. Wang, X. Zhang and X. Gao (2020) Adaptive latent similarity learning for multi-view clustering. Neural Networks 121, pp. 409–418. Cited by: §I.
  36. J. Xie, R. Girshick and A. Farhadi (2016) Unsupervised deep embedding for clustering analysis. In ICML, pp. 478–487. Cited by: §III-C4.
  37. C. Xu, D. Tao and C. Xu (2013) A survey on multi-view learning. arXiv preprint arXiv:1304.5634. Cited by: §I, §II-A.
  38. Q. Yin, S. Wu, R. He and L. Wang (2015) Multi-view clustering via pairwise sparse subspace representation. Neurocomputing 156, pp. 12–21. Cited by: §II-A.
  39. Q. Yin, S. Wu and L. Wang (2015) Incomplete multi-view clustering via subspace learning. In ACM ICIKM, pp. 383–392. Cited by: §I, §I.
  40. K. Zhan, C. Niu, C. Chen, F. Nie, C. Zhang and Y. Yang (2018) Graph structure fusion for multiview clustering. IEEE TKDE 31 (10), pp. 1984–1993. Cited by: §I.
  41. C. Zhang, H. Fu, Q. Hu, X. Cao, Y. Xie, D. Tao and D. Xu (2020) Generalized latent multi-view subspace clustering. IEEE TPAMI 42 (1), pp. 86–99. Cited by: §I.
  42. C. Zhang, Z. Han, H. Fu, J. T. Zhou and Q. Hu (2019) CPM-nets: cross partial multi-view networks. In Advances in Neural Information Processing Systems, pp. 557–567. Cited by: §I.
  43. H. Zhang, T. Xu, H. Li, S. Zhang, X. Huang, X. Wang and D. Metaxas (2017) Stackgan: text to photo-realistic image synthesis with stacked generative adversarial networks. In ICCV, pp. 5907–5915. Cited by: §II-C.
  44. H. Zhang, V. Sindagi and V. M. Patel (2019) Image de-raining using a conditional generative adversarial network. IEEE transactions on circuits and systems for video technology. Cited by: §I.
  45. B. Zhao, X. Wu, Z. Cheng, H. Liu, Z. Jie and J. Feng (2017) Multi-view image generation from a single-view. arXiv preprint arXiv:1704.04886. Cited by: §II-C.
  46. H. Zhao, Z. Ding and Y. Fu (2017) Multi-view clustering via deep matrix factorization.. In AAAI, pp. 2921–2927. Cited by: §II-A.
  47. H. Zhao, H. Liu and Y. Fu (2016) Incomplete multi-modal visual data grouping.. In IJCAI, pp. 2392–2398. Cited by: §I, §II-B.
  48. S. Zhi and H. Zhou (2014) Partial multi-view clustering. In AAAI, Cited by: §I, §I, §II-B, §IV-A2.
  49. J. Zhu, T. Park, P. Isola and A. A. Efros (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. arXiv preprint arXiv:1703.10593. Cited by: §I, §II-C.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description