Generative Partial MultiView Clustering
Abstract
Nowadays, with the rapid development of data collection sources and feature extraction methods, multiview data are getting easy to obtain and have received increasing research attention in recent years, among which, multiview clustering (MVC) forms a mainstream research direction and is widely used in data analysis. However, existing MVC methods mainly assume that each sample appears in all the views, without considering the incomplete view case due to data corruption, sensor failure, equipment malfunction, etc. In this study, we design and build a generative partial multiview clustering model, named as GPMVC, to address the incomplete multiview problem by explicitly generating the data of missing views. The main idea of GPMVC lies at twofold. First, multiview encoder networks are trained to learn common lowdimensional representations, followed by a clustering layer to capture the consistent cluster structure across multiple views. Second, viewspecific generative adversarial networks are developed to generate the missing data of one view conditioning on the shared representation given by other views. These two steps could be promoted mutually, where learning common representations facilitates data imputation and the generated data could further explores the view consistency. Moreover, an weighted adaptive fusion scheme is implemented to exploit the complementary information among different views. Experimental results on four benchmark datasets are provided to show the effectiveness of the proposed GPMVC over the stateoftheart methods.
I Introduction
Multiview data could be samples collected from multiple sources, modalities captured by various sensors, or features extracted with different methods. Owing to the advance of hardware technology, multiview data are quite common in real world [37]. For example, an image in social network could be represented by either its visual cues or the users’ comments on it. Compared with singleview data, multiple views usually boost the model performance [9, 41, 5] by providing the complementary information to represent the same data.
In recent years, increasing research efforts have been made in multiview learning, where multiview clustering (MVC) [35, 3, 25, 26, 40] forms a mainstream task that aims to explore the underlying data cluster structure shared by multiple views. MVC methods work as an effective data analysis tool for unlabeled multiview datasets, and could significantly improve the clustering performance by fusing the information from different views. However, although traditional MVC methods achieve promising progress, their effectiveness depends on the completeness assumption for all the views of each instance. Hence, their performance may degrade when some views include missing data, which raise a challenging case as partial multiview data [48] or incomplete view data [24, 39].
In practice, incompleteview data are quite ubiquitous due to the environment issues, obstacle, noise, and malfunction of the collection/transmission/storage equipments [48]. For example, in medical data, some patients may not finish a complete examination for time conflict or other reason; in social multimedia, some instances may not contain visual or audio view as a result of the sensor breakdown. However, traditional MVC methods cannot handle incomplete challenges directly because they aim to find a shared structure among all views and require completeness of each data. In light of this, partial multiview clustering (PMVC) algorithms have been developed [48, 24, 39, 28, 42, 16].
In the pioneering works, PMVC methods simply use zero or mean value to fill up the incomplete views. However, these simply imputed data are quite different from the real ones, which avoid MVC to learn a consistent clustering structure and badly degrade the final clustering performance. Existing PMVC methods target to establish a shared latent subspace with complete views and then compensate the latent representations for the missing data, which could be divided into two main directions. Specifically, the first kind is kernel based methods [24, 16], where the main idea is to leverage kernel matrices of the complete views for completing the kernel matrix of incomplete view. This kind of method can only be applied in kernelbased multiview clustering. The second kind of methods are based on the nonnegative matrix factorization (NMF) [47, 22]. For sample missing from a certain view, these methods recover the nonnegative matrix corresponding to the view with those obtained from the sample of which the view is unmissing. However, these two kinds of methods still have several limitations. (1) They need to process all the data together and it is inefficient to employ them for the largescale databases. (2) They require numerous inverse operations for matrix factorization, resulting in a high time complexity. (3) They mainly exploit some regularization and add some constraints on the new representation, yet fail in compensating the missing data in each view explicitly.
Inspired by generative adversarial networks (GAN) [44, 8, 12], it is natural to synthesize the missing data for learning representations of partial views. Vanilla GAN [7] was proposed to generate desired data from random noise. Recently, some GAN [49, 23] models are designed to learn the relationship between different views. Following this line, we consider to leverage GAN model for compensating the missing data. Nonetheless, directly applying GAN in solving PMVC is not straightforward. First, it is challenging for PMVC to generate the missing data based on partial views, rather than complete views. Second, it is underexplored for existing methods to explicitly consider the clustering task during learning representations from multiple views.
In this paper, we develop a novel generative partial multiview clustering model, termed as GPMVC, for the PMVC task. The proposed model is composed of four parts (See Fig. 1): multiview encoder networks, weighted adaptive fusion layer, clustering layer, and viewspecific generative adversarial networks. The proposed model employs multiview encoder networks to encode the shared latent representation among multiple views. We naturally develop viewspecific generative adversarial networks to predict the missingview data conditioning on the latent representations from the other views. Specifically, we resort to adversarial training to explore consistent information among all the views. The generators of GPMVC aim to complete the missing data, while the discriminators distinguish fake data from real ones for each view. One clustering layer is designed to boost the clustering structure of the common representation so that it could provide an explicit guidance for representation learning of clustering task. Moreover, we add a weighted adaptive fusion scheme to further exploit the complementary information among different views by introducing a group of learnable weights. By integrating clustering into the generating process, the proposed GPMVC can adjust generator to compensate the “ideal” missing data and thus improve the clustering performance.
This paper is an extension to our previous work [32]. Compared with [32], several substantial differences have been made as follows: (1) We extend the architecture of GPMVC from two views to multiple views to make our model generalize well in realworld applications. (2) We develop a new adaptive fusion layer for integrating the complementary information from different views. (3) More theoretical analyses, model discussions and experimental evaluations are provided. We highlight the contribution of this work as the following.

A novel GAN based partial multiview clustering method named as GPMVC is proposed to capture the shared clustering structure, and to generate missingview data. Specifically, the proposed GPMVC learns a consistent subspace shared by multiple views to provide common latent representations for both clustering and data generation tasks.

The proposed GPMVC fully leverages consistent information provided by multiview data. Particularly, GPMVC obtains a latent representation with one view, with which it further generates the missing data of the corresponding views. The complementary missingview data help improve clustering performance.

Extensive experiments on several multiview datasets are conducted. Compared with several stateoftheart methods, the experimental results proves the superiority of GPMVC.
The remainder of this paper is organized as follows. In Section II, we conduct a brief review and analysis on related works. Then we introduce the proposed generative partial multiview clustering in Section III. Experimental setting and evaluation results are reported in Section IV. Finally, we conclude our paper in Section V.
Ii Related Works
As more and more missing multiview data become common in realworld application. Incomplete multiview clustering methods have been proposed for multiview data clustering. In this section, we will introduce some multiview clustering methods, partial multiview clustering methods, and generative adversarial networks.
Iia MultiView Clustering
Multiview clustering methods can be divided into three categories. The first category is spectralbased methods [37, 14, 29, 10]. These methods usually learn a shared similarity matrix among different views and conduct spectral clustering for the final partition result. For example, Kumar et al. [14] designed a coregularized multiview spectral clustering, which can perform clustering on different views simultaneously. Motivated by this work, Tsivtsivadze et al. [29] designed neighborhood coregularized multiview spectral clustering for microbiome data clustering. The second one is subspacebased method which learns a shared coefficient matrix from each view [38, 33, 17]. Based on this idea, Yin et al. [38] proposed a pairwise sparse multiview subspace clustering by enforcing the coefficient matrices from each pair of views as similar as possible. Different from the above approaches, the third category [46] mainly uses nonnegative matrix factorization to learn a common indicator matrix from different views. Zhao et al. [46] adopted a deep seminonnegative matrix factorization to perform multiview clustering. On account of increasing application of partial multiview clustering, researchers have proposed partial multiview clustering.
IiB Partial MultiView Clustering
Piyush et al. [28] designed the first partial multiview clustering approach. They adopted one views kernel representation as the similarity matrix, and employed Laplacian regularization to complete the kernel matrices of incomplete view. Nonetheless, this approach requires one complete view that consists of all instances. To tackle the problem, an incomplete multiview clustering was developed based on kernel canonical correlation analysis [24, 16]. These methods optimize the alignment of shared instances in the dataset and thereby can collectively complete the kernel matrices of incomplete view. Despite the effectiveness of these methods, they can only be applied in kernelbased methods. Recently, Nonnegative Matrix Factorization (NMF) based Partial View Clustering (PVC) algorithm was proposed in [48]. It establishes a latent subspace in which different view’s examples belonging to one instance are close to each other. It shows to be effective for partial multiview data. Inspired by its promising performance, numerous NMF based multiview methods are developed [47, 22]. For example, Rai et al. [22] improve it with graph regularized NMF. Zhao et al. [47] proposed an Incomplete MultiModal Visual Data Grouping (IMG), and its main idea is to learn a unified framework by integrating latent subspace generation and compact global structure. However, all these methods utilize a latent space learned for multiview data with NMF, which restricts its application over negative feature data and nonnegative matrix factorization requires complicated calculation, so they cannot be used for largescale data. On the other hand, a lot of works using deep model to generate images and achieve successful paradigms.
IiC Generative Adversarial Networks
Generative adversarial network (GAN) was developed by Goodfellow et al. [27]. It quickly attracted a huge amount of interest because of its extraordinary performance and interesting theory. Recently, many variations of GAN have been put forward for various goals and applications. For example, Mao et al. proposed Least Squares GAN to solve the vanishing gradients problem when minimizing the objective function [18]. Rather than cross entropy loss function, it adopts least squares loss function for the discriminator. Zhang et al. [43] applied GAN in photorealistic images generation conditioned on text descriptions, and introduced Stacked GAN (StackGAN). In StackGAN, two GANs in different stages are adopted to generate highresolution images. Odena et al. introduced more structures and a specialized cost function to GAN latent space for highquality sample generation, and then proposed conditional GAN associated with an auxiliary classifier (ACGAN) [20]. In this way, the conditional information can be class labels or data from other modalities. Isola et al. [11] further explored the application of conditional GANs on paired training data and developed pix2pix GAN, and it can transfer images from one distribution to another effectively. Zhu et al. proposed Cycle GAN [49], which trains unpaired image with a cycle consistent adversarial network by adding cycle consistent loss. It shows a more powerful ability than pix2pix GAN in image translation from one domain to another, and hence effectively solves the paired sample shortage problem. GAN also gains a wide application in multiview data generation [23] and [45].
Iii Generative Partial MultiView Clustering
Iiia Motivation
Existing works which can deal with partial multiview data are mostly based on kernel and NMF to predict the missing data and then do clustering task at the same time. Despite appealing performance achieved, they still have two limitations. First, nonnegative matrix factorization based methods need a lot of computation for inverse operation. As a result, they cannot be applied to largescale data. Second, all these methods only focus on learning a shared latent space for clustering. They ignore learning a latent space that is suitable for clustering and can be used to generate the missing view data simultaneously. To address these two challenges, we design a novel model called Generative Partial MultiView Clustering (GPMVC). We combine the generative capacity of GAN and the clustering capacity of deep embedding clustering to our model. Thus, it can generate the missingview data and learn a better clustering structure for partial multiview data at the same time.
Notations. We represent the data with the multiview data matrix , where , is the number of views, is the number of samples, and is the feature dimension of th view. Since the setting of our model is partial multiview clustering, we divide the multiview data to two parts: one is paired data in which all the view is complete. The other one is unpaired data in which some data is missing. We give an example of multiview data in Fig. 2. respectively denote the missing data or generated data of each view.
IiiB Framework
Network Architecture. Fig. 1 illustrates the architecture of our model for partial multiview data. It is composed of five subnetworks: encoder network , weighted adaptive fusion layer, deep embedding clustering layer, generator network , and discriminator network . For multiview data, corresponding to each view, our model has encoders, one fusion layer, one clustering layer, generators and discriminators. We introduce the model in details as follows.
Encoder network : . Each view has a encoder which is stacked fully connected. It encodes the th original view to a latent representation , where . Denote the nonlinear function of th encoder by . It maps the dimensional original data to a dimensional latent representation , , where is shared parameters of all encoders. In order to capture shared structure of multiview data, we partially share parameters of encoders for all the view, i.e. .
Generator network : . In our model, the generator can also be seen as decoder as it has symmetrical structure with encoder, i.e., stacked fully connected. Corresponding to each view, there is one exclusive generator which can generate the corresponding view. For example, the th generator inputs the latent representation and outputs the generated th view . That is to say, for , it will outputs , no matter which latent representation it inputs. Thus, we hope the latent representation are similar to each other. The best status is they are equal with each other. However, the equal condition is too strict, so we introduce a common representation for each view to relax the equal condition.
Discriminator network : . Similar with generator, corresponding to each view, there is one exclusive discriminator which is composed of 3 stacked fully connected layers. As shown in Fig. 1, all discriminators are connected with generators. Take the th discriminator as an example, it inputs the real sample and the generated fake sample of the th view. Then it outputs judgment result of the authenticity for the generated sample : real/fake, i.e., . The result means that discriminator considers the generated sample is real or fake. The discriminator’s result will also feedback to generator , and prompt generator to produce more realistic sample. The process will be repeat until generator can produce so real sample that discriminator cannot distinguish which one is generated sample.
Weighted adaptive fusion layer: After encoder, we got latent spaces . To fully explore the complimentary information across multiview images, we adaptively fuse the latent representations of different view and learn a common representation . Specifically, we design a learnable fusion layer to obtain by , where denotes the fusion function parameterized by .
Deep embedded clustering layer: This layer will improves the distribution of common representation and obtain clustering result. First, we compute the original distribution of common representation , named it as , then based on , we can compute a target distribution which is more compact and suitable for clustering. According target distribution, we refine encoder network and hope it can learn a latent representation which is similar to the target distribution.
IiiC Objective Function
Our objective function includes four terms as: autoencoder loss, adversarial training loss, weighted adaptive fusion loss and KL clustering loss.
AutoEncoder Loss
The autoencoder loss works on encoder network and generator network . We hope the output of generator is similar with the input of encoder. Thus, we minimizes the squared Fnorm of the reconstruction error between the generated sample and input sample. The autoencoder loss is
(1) 
Take the th view as an example, when it passes through encoder , we get , i.e., the latent representation . In autoencoder loss, we take generator as decoder corresponding to encoder . Thus, the function of generator is reconstructing input sample from latent representation . denotes the output of generator. We minimize the reconstruction error for obtain encoder and generator networks which can output similar sample with the input sample.
Generally, when all input data are paired, autoencoder loss alone is sufficient to train the encoder and generator networks. Nonetheless, in partial multiview learning, the performance of encoder and generator networks will be greatly degraded because of unpaired data. To tackle this problem due to unpaired data, apart from autoencoder loss, we further employ adversarial training loss to refine the network in the objective function.
Adversarial Training Loss
Suppose is a sample from data distribution , and is a noise sample from noise distribution . A typical GAN is composed of two subnetworks, namely, a generator and a discriminator . Among them, generates a fake image with a vector of random noise as input, while aims to distinguish the fake image generated by from the real image, and it will return a value ranging from to , which indicates the probability whether the input image is real or fake. GAN adopts the idea of game theory, and and are somehow participating a minmax game. Specifically, the loss functions of and respectively try to minimize and maximize the likelihood that the fake image assigns to the fake source. From this view, we can easily understand the loss function of GAN
(2) 
Considering that there exist a large amount of unpaired data and lack paired data in our setting, we employ cycle GAN in our model to conduct adversarial learning, which can effectively handle the problem due to insufficiency of paired data. A cycle GAM model trains two GAN models and adds a GAN loss and a cycle consistency loss based on Eq. (2) to tackle unpaired data situation. Fig. 3 shows the framework of a cycle GAN model. Its main idea is each data distribution can generate the other via these two GAN models. According to the theory of cycle GAN, we design a multiview adversarial training loss for multiview data in our model
(3) 
The adversarial training loss mainly works on generator and discriminator networks. Next we will give the definition of GAN loss and cycle consistency loss . Suppose that the data distribution of the th views is , and let denote the mapping from th view data distribution to th view data distribution , i.e., using th view sample to generate the th view sample while is paired with . The same theory for , which transform the th view sample to th view sample. inputs ’s generated sample and real sample , and then outputs the discriminant result. Thus, for partial multiview data, the loss of th GAN network in our model is
(4) 
The generator aims at generating fake sample which is similar to real sample, and the discriminator tries to distinguish the generated sample from real sample. In this way, generator and discriminator play an opposite game until convergence when generator can generate great real sample. Nonetheless, GAN has a character that it maps a same input to any random permutation of sample in the target data distribution. Therefore, the model cannot obtain a desired output only with the GAN loss. To tackle this problem, cycle GAN further employs cycleconsistent loss to update the learned mapping and thereby reduces the space of possible mapping functions. Specifically, for each sample in th views, after passing through encoder and generator , we get the generated sample . Then, let the generated sample pass through encoder and generator . Finally, we obtain the generated sample after passing the translation cycle. Since the cycle consistency loss assists the generator in mapping a given sample to a desired output which is special and paired with , we can combine GAN loss and cycle consistency loss to guarantee a desired output effectively. The multiview cycle consistency loss of our model is minimizing the norm of the reconstruction error between the final generated sample and input sample
(5) 
Weighted Adaptive Fusion Loss
Before clustering, we need fuse all the latent space. There we use weighted adaptive fusion We will learn latent subspaces for each view data , where . Then, by the following equation, we get a common representation based on all latent subspaces
(6) 
where denotes a concatation or summation function.
After getting the latent representation of views from encoder, we adopt an adaptive fusion method to extract public identification information that is beneficial to clustering from the multimodal primitive space. In a manner similar to classification, the authentication information is used to approximate the ideal data distribution. We define the adaptive fusion loss function as
(7) 
where is the output of encoder, is a group learnable parameter, and denotes the fusion function.
KL Clustering Loss
The analysis demonstrates that the performance of the generator and discriminator networks can be improved by employing adversarial training loss. Nevertheless, it makes little modification to the encoders. The encoders learn a common representation for the final clustering task based on paired data and unpaired data. However, unpaired data will negatively affect its clustering performance. To optimize encoder and obtain a better clustering structure, we add a clustering loss measured by KullbackLeibler divergence (KLdivergence) in our model. We represent initial clustering centroids with . In order to measure the similarity between common representation point and centroid , we refer to the method in [36] and employ the Student’s distribution as a kernel. Then we can calculate the probability that a sample is assigned to cluster with the following formula
(8) 
It is also called soft assignment. Herein, we denote the degree of freedom of the Student’s distribution as . Then we first raise to the squared and then normalize it with frequency per cluster to obtain the target distribution .
(9) 
where represents soft cluster frequency. In this way, our method improves clustering performance and is able to lay special stress on data points assigned with high confidence. Finally, we leverage minimizing the KLdivergence between original data distribution and target distribution as clustering loss
(10) 
We aim to match the soft assignment to the target distribution . This assists in sharpening the data distribution and also concentrating the same class data. Additionally, we are able to achieve a more common representation that works more effectively in partial multiview clustering.
Overall objective
By integrating autoencoder loss, adversarial training loss, weighted adaptive fusion loss and KL clustering loss, we have the following objective function of GPMVC
(11) 
where , , and are parameters to adjust the impact of each term in all objective function. Next we will give the details of each term in Eq. (11).
IiiD Implementation
Step 1: Training encoder and generator on paired data.
We only use paired data to train encoder and generator of our model at first. Since they can be seen as autoencoder network, we just use the autoencoder loss to optimise . Specifically, we take paired data as input for encoder and get latent subspace . Then they pass through generator respectively and output . Finally we compute clustering centroids and use autoencoder loss to update encoder and generator networks. By step 1, we obtain clustering centroids which will be used in step 3. These clustering centroids learned by paired data can help the generated sample for missingview sample to be assigned to the right group.
Step 2: Training generator and discriminator on all data.
We use all data to train generator and discriminator networks, i.e., multiview cycle GAN, in this step. For paired data , we directly take them as input of encoder . When encountering unpaired data of th view, we randomly choose one sample from th view () as input of encoder in each epoch to increase the number of unpaired data. After step 2, we save the output of generator , i.e., the generated sample for missing view. Then we compute a new common representation on complete database.
Step 3: Training all model on the completed dataset.
We use the clustering centroids from step 1, and the completed dataset from step 2 as input for our model to train it again. In each epoch, we update the the clustering centroids, the common representation and the generated sample for the missing view.
Algorithm 1 illustrates the training procedure of our model.
Methods  0.1  0.3  0.5  0.7  0.9 

best SC  0.47480.0131  0.5169 0.0174  0.56920.0159  0.61390.0121  0.67160.0136 
AMGL [19]  0.25240.0349  0.23570.0180  0.25380.0155  0.28070.0125  0.29580.0195 
RMSC [34]  0.33950.0050  0.36830.0051  0.39070.0045  0.42330.0048  0.44990.0022 
ConSC [14]  0.27810.0411  0.22300.0148  0.21390.0078  0.21060.0058  0.28840.0896 
PVC [22]  0.50150.0438  0.54240.0537  0.62770.0402  0.68330.0931  0.75460.1091 
IMG [21]  0.43730.0100  0.45080.0254  0.48680.0147  0.50550.0131  0.51760.0415 
PVCGAN [32]  0.52100.0090  0.67110.0107  0.86310.0043  0.9154 0.0107  0.94980.0026 
GPMVC  0.58740.0249  0.78680.0234  0.88790.0128  0.93190.0082  0.96550.0088 
Methods  0.1  0.3  0.5  0.7  0.9 

best SC  0.34830.0080  0.39560.0076  0.44290.0114  0.47740.0103  0.52770.0106 
AMGL [19]  0.15580.0155  0.14120.0218  0.15240.0343  0.24150.0631  0.33460.0288 
RMSC [34]  0.34920.0077  0.41500.0294  0.45750.0233  0.49600.0174  0.51440.0204 
ConSC [14]  0.37040.0275  0.35810.0231  0.36740.0131  0.41370.0396  0.50880.0299 
PVC [22]  0.35250.0238  0.38640.0104  0.42380.0446  0.44010.0150  0.46440.0423 
IMG [21]  0.46550.0186  0.46400.0213  0.46130.0146  0.45920.0146  0.46220.0151 
PVCGAN [32]  0.45170.0086  0.48360.0071  0.52800.0078  0.52020.0070  0.53400.0073 
GPMVC  0.56460.0247  0.55420.0330  0.57760.0169  0.59630.0088  0.59550.0163 
Methods  0.1  0.3  0.5  0.7  0.9 

best SC  0.48630.0122  0.51880.0112  0.56640.0143  0.61140.0189  0.66130.0178 
AMGL [19]  0.60560.0489  0.68280.0564  0.73700.0281  0.75060.0320  0.75940.0211 
RMSC [34]  0.46420.0159  0.52930.0096  0.59250.0154  0.65070.0202  0.71540.0375 
ConSC [14]  0.50630.0325  0.54380.0272  0.59820.0246  0.69820.0481  0.79160.0299 
PVC [22]  0.32380.0087  0.30770.0078  0.34190.0148  0.42360.0168  0.57300.0261 
IMG [21]  0.53500.0192  0.54550.0262  0.54570.0193  0.55290.0166  0.56330.0213 
PVCGAN [32]  0.69820.0104  0.83800.0144  0.88060.0081  0.90300.0074  0.92340.0055 
GPMVC  0.76290.0276  0.91410.0040  0.93720.0056  0.94540.0051  0.95080.0026 
Methods  0.1  0.3  0.5  0.7  0.9 

best SC  0.18630.0051  0.19660.0070  0.20810.0033  0.21770.0052  0.22630.0029 
AMGL [19]  0.16770.0083  0.18170.0083  0.18500.0123  0.18080.0076  0.18150.0064 
RMSC [34]  0.19250.0051  0.20010.0084  0.21360.0097  0.22390.0050  0.22870.0035 
ConSC [14]  0.15730.0050  0.16500.0048  0.18040.0066  0.19600.0058  0.21480.0039 
PVC [22]  0.11180.0017  0.12020.0018  0.12900.0037  0.14820.0084  0.17140.0114 
IMG [21]  0.11360.0019  0.12150.0036  0.12620.0028  0.13100.0020  0.13530.0017 
PVCGAN [32]  0.17110.0044  0.19880.0063  0.21910.0050  0.22160.0112  0.23130.0082 
GPMVC  0.19150.0068  0.22160.0092  0.23980.0063  0.24750.0046  0.27700.0089 
Iv Experimental Analysis
To test the performance of our method, we conduct several experiments on four multiview databases and compare our method with the stateoftheart PMVC and MVC methods.
Iva Experimental Setting
Dataset
We evaluate our method on four different datasets. In the following part, we present a brief introduction to these four datasets.
BDGP [2]: a database that consists of both visual view and textual view. It contains images about drosophila embryos from categories, and each image is described by two vectors, i.e., a D visual vector and a D textual feature vector. In the experiment, all the data are used to evaluate the performance of the aforementioned methods on both the two features.
MNIST [15]: a handwritten digits image database composed of training examples and testing examples, and each of them has the size of pixels. We use the original black and white image of MNIST and its corresponding edge image for testing. Since it is difficult to conduct comparison on largescale database, we construct a sampled MNIST database by sampling images from the original database randomly, and then employ the sampled MNIST database to conduct the experiments.
Handwritten numerals (HW) [30]: an image database with images of classes from to digit. Each class contains samples with kinds of features, i.e., 76 Fourier coefficients for twodimensional shape descriptors(FOU), 216 profile correlations (FAC), 64 KarhunenLoeve coefficients (KAR), 240 pixel feature (PIX) obtained by dividing the image of 3048 pixels into 240 tiles of 23 pixels and counting the average number of object pixels in each tile, 47 rotational invariant Zernike moment (ZER), and 6 morphological (MOR) features. In our experiment, we choose the first three views of HW database: 76 Fourier coefficients for twodimensional shape descriptors(FOU), 216 profile correlations (FAC), 64 KarhunenLoeve coefficients (KAR).
NUSWIDE (NUS) [4]: database consists of 269,648 images of 81 concepts. In our experiments, we select 12 categories of animal concepts, including cat, cow, dog, elk, hawk, horse, lion, squirrel, tiger, whales, wolf, and zebra. We extract three kinds of lowlevel features from this database: 144 color correlogram, 128 wavelet texture, 225 blockwise color moment.
Baseline Methods and Evaluate Metrics
We implement some stateoftheart partial multiview clustering methods. In details, we compare the proposed GPMVC with Incomplete MultiModal Visual Data Grouping (IMG) [21], Partial MultiView Clustering using Graph Regularized NMF (PVC) [22, 48], Partial MultiView Clustering via Consistent GAN (PVCGAN) [32]. We also compare GPMVC with several multiview clustering methods: Feature Concatenation Spectral Clustering (ConSC) [14], Robust Multiview Spectral Clustering (RMSC) [34], Autoweighted Multiple Graph Learning (AMGL) [19], and spectral clustering of the best single view (best SC) as well.
For the partial view setting, we test the aforementioned methods under different impartial ratios. The impartial ratio is defined as the proportion of complete samples in all the samples. It varies from to with an interval of . For MVC methods, which cannot handle missing instances, we fill in the missing instances with the average feature vector. We adopt three standard clustering validation metrics, i.e. Accuracy (ACC) [1], Normalized Mutual Information (NMI) [6], and Purity [31], to evaluate the performance of each method.
Specifically, we randomly choose five groups of samples as missing data according to five different impartial ratios () in each database. In the experiment, we repeat this process 10 times. We report the average clustering accuracy and the corresponding standard deviation of all the methods on four databases, and provide the average Accuracy results in Table I to IV respectively. Fig. 4 shows the average clustering NMI and Purity in terms of different impartial ratios on the four databases.
Implementation Details
Our algorithm is implemented with the public toolbox of PyTorch on a desktop with Ubuntu system and NVIDIA Titan V Graphics Processing Units (GPUs) asa well as GB memory. We train our model using Adam [13] optimizer with default parameter setting, and the learning rate is fixed as . For each training step, we conduct epochs and record the experimental results. Besides, we also test all the performance of the other methods by Matlab on the same environment for comparison.
IvB Partial Multiview Clustering Performance
The evaluation results are summarized in Table I to Table IV and Fig. 4, which indicate that our methods achieves better clustering performance than the others on all the cases. Here, we present some important observations as below:
From Table I to Table IV, and Fig. 4, we could see that PMVC methods are superior in most databases, especially when the partial ratio is large. It indicates that missingview data have a negative influence on effectiveness of MVC methods. Therefore, PMVC methods are more effective when confronting missing data problem. From Table II, we could see that when the view is not complete, the multiview clustering method AMGL has inferior performance than singleview methods. It further illustrates that some of multiview methods are sensitive to missing data or noises.
The results of Table I to Table IV also demonstrate that our method outperforms methods tested. It is probably because of that GPMVC is able to learn a consistent clustering structure for each view, with which it effectively generates the missing data. Thereby, it can construct a more effective common subspace with these complementary missing data. Compared the results of PVCGAN and GPMVC in Table I and Table II, we can observe GPMVC perform better than PVCGAN. The difference of these two only is the fusion way. In GPMVC, we use weighted adaptive fusion way to fuse the multiple latent space. It prove the effectiveness of the weighted adaptive fusion loss.
From Table III to Table IV, we can see that partial multiview methods (PVC, IMG, and PVCGAN) perform worse than others. It is because all these methods can only be conducted on the twoview databases. While HW and NUS have more than two views. Thus, for PVC, IMG, and PVCGAN, we use the first two views of HW and NUS. For the MVC methods, we still use the average sample to fill up the missing sample. The experimental results illustrate our model can well be applied to multiview databases.
Loss  0.1  0.3  0.5  0.7  0.9 

AE  0.6530  0.7910  0.8760  0.8835  0.8945 
AE + AT  0.7165  0.8680  0.8840  0.8985  0.9000 
ALL  0.7629  0.9141  0.9372  0.9454  0.9508 
Clustering metrics  0.1  0.3  0.5  0.7  0.9 

Accuracy  0.50160.0074  0.51440.0070  0.52030.0097  0.53910.0104  0.55510.0120 
NMI  0.45670.0018  0.46450.0047  0.46590.0045  0.51430.0029  0.48280.0167 
Purity  0.53980.0033  0.55550.0043  0.55670.0073  0.56890.0049  0.57570.0049 
IvC Model Discussion
Ablation Study
To verify how much effect of each term in the objective function of our model, we conduct ablation studies. We perform the following three experiments to isolate the effect of the loss , , and . In the first experiment, we only use autoencoder loss to train encoder and generator networks. In the second experiment, we use autoencoder loss and adversarial training loss to train encoder, generator and discriminator networks. In the third experiment, we use all objective function to train our model, i.e., based on the second experiment, we add adaptive fusion loss and KLclustering loss . In each experiment, we change the partial ratio from to with an interval of , and do clustering task on the common representation learned by encoder network. Then we show the clustering accuracy on Table V. From Table V, we can see that the third experiment using all objective function has the best performance, and the clustering accuracy of using autoencoder loss and adversarial training loss is superior to that only using autoencoder loss. It illustrates each term in our objective function has great effect on the final clustering performance of our model. When adding adversarial training loss to autoencoder loss , the clustering accuracy is improved. It is probably because that generated sample for missing view promote to learn a better clustering structure and improve clustering performance. Weighted adaptive fusion loss and KLclustering loss boost the performance after adding it in the third experiment. This phenomenon illustrates that a good common representation will in turn helps generate more realistic samples for missing view and improve clustering performance again.
Missing Data Generation
To visually present the generated results of our method, we show the generated images (marked by red box) by our method on the sampled MNIST database under three different impartial ratios (0.1, 0.5, 0.9) on Fig 5. We can see our method can well generate the missing data. With the number of paired data increasing, our method can generator more real data.
LargeScale Partial MultiView Clustering
In addition, we conduct all methods on the total MNIST database in terms of different impartial ratios and run 10 times. Due to all the other methods go wrong with the problem of out of memory. We only show the clustering performance of our method in Table VI. All other methods cannot clustering on images as for the memory of computer does not satisfy the need of other algorithm. This illustrate our method can be used on large scale database.
V Conclusions
In this paper, we propose a novel generative partial multiview clustering approach. It is able to fill up the incomplete views based on the common subspace via GAN model, and learn an excellent clustering structure simultaneously. In addition, it further employs the complementary incomplete view to study a consistent common structure and greatly improves its clustering performance. We validate the clustering performance improvement of the proposed method via a series of comprehensive experiments, and comparison results to several existing methods demonstrates the superiority of GPMVC.
References
 (2005) Document clustering using locality preserving indexing. IEEE TKDE 17 (12), pp. 1624–1637. Cited by: §IVA2.
 (2012) Joint stage recognition and anatomical annotation of drosophila gene expression patterns. Bioinformatics 28 (12), pp. i16–i24. Cited by: §IVA1.
 (2017) A survey on multiview clustering. arXiv preprint arXiv:1712.06246. Cited by: §I.
 (2009) NUSwide: a realworld web image database from national university of singapore. In ACM CIVR, pp. 48. Cited by: §IVA1.
 (2018) Robust multiview data analysis through collective lowrank subspace. IEEE TNNLS 29 (5), pp. 1986–1997. Cited by: §I.
 (2009) Normalized mutual information feature selection. IEEE TNNLS 20 (2), pp. 189–201. Cited by: §IVA2.
 (2014) Generative adversarial nets. In Advances in neural information processing systems, pp. 2672–2680. Cited by: §I.
 (2020) A review on generative adversarial networks: algorithms, theory, and applications. arXiv preprint arXiv:2001.06937. Cited by: §I.
 (2017) Discriminative embedded clustering: a framework for grouping highdimensional data.. IEEE TNNLS 26 (6), pp. 1287–1299. Cited by: §I.
 (2019) Multiview spectral clustering network. In Proc. 28th Int. Joint Conf. Artif. Intell., pp. 2563–2569. Cited by: §IIA.
 (2017) Imagetoimage translation with conditional adversarial networks. arXiv preprint. Cited by: §IIC.
 (2019) A stylebased generator architecture for generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4401–4410. Cited by: §I.
 (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §IVA3.
 (2011) Coregularized multiview spectral clustering. In Advances in neural information processing systems, pp. 1413–1421. Cited by: §IIA, TABLE I, TABLE II, TABLE III, TABLE IV, §IVA2.
 (1998) The mnist database of handwritten digits. http://yann. lecun. com/exdb/mnist/. Cited by: §IVA1.
 (2018) Late fusion incomplete multiview clustering. IEEE TPAMI 41 (10), pp. 2410–2423. Cited by: §I, §I, §IIB.
 (2018) Consistent and specific multiview subspace clustering. In ThirtySecond AAAI Conference on Artificial Intelligence, Cited by: §IIA.
 (2017) Least squares generative adversarial networks. In IEEE ICCV, pp. 2813–2821. Cited by: §IIC.
 (2016) Parameterfree autoweighted multiple graph learning: a framework for multiview clustering and semisupervised classification.. In IJCAI, pp. 1881–1887. Cited by: TABLE I, TABLE II, TABLE III, TABLE IV, §IVA2.
 (2016) Conditional image synthesis with auxiliary classifier gans. arXiv preprint arXiv:1610.09585. Cited by: §IIC.
 (2016) Double constrained nmf for partial multiview clustering. In DICTA, pp. 1–7. Cited by: TABLE I, TABLE II, TABLE III, TABLE IV, §IVA2.
 (2016) Partial multiview clustering using graph regularized nmf. In ICPR, pp. 2192–2197. Cited by: §I, §IIB, TABLE I, TABLE II, TABLE III, TABLE IV, §IVA2.
 (2017) VIGAN: missing view imputation with generative adversarial networks. arXiv preprint arXiv:1708.06724. Cited by: §I, §IIC.
 (2013) Clustering on multiple incomplete datasets via collective kernel learning. In IEEE ICDM, pp. 1181–1186. Cited by: §I, §I, §I, §IIB.
 (2018) Learning a joint affinity graph for multiview subspace clustering. IEEE TMM 21 (7), pp. 1724–1736. Cited by: §I.
 (2019) Marginalized multiview ensemble clustering. IEEE TNNLS. Cited by: §I.
 (2017) Missing modalities imputation via cascaded residual autoencoder. In IEEE CVPR, pp. 1405–1414. Cited by: §IIC.
 (2010) Multiview clustering with incomplete views. In NIPS Workshop, Cited by: §I, §IIB.
 (2013) Neighborhood coregularized multiview spectral clustering of microbiome data. In IAPR, pp. 80–90. Cited by: §IIA.
 (1998) Handwritten digit recognition by combined classifiers. Kybernetika 34 (4), pp. 381–386. Cited by: §IVA1.
 (2005) Compact: a comparative package for clustering assessment. In ISPA, pp. 159–167. Cited by: §IVA2.
 (2018) Partial multiview clustering via consistent gan. In ICDM, pp. 1–6. Cited by: §I, TABLE I, TABLE II, TABLE III, TABLE IV, §IVA2.
 (2017) Exclusivityconsistency regularized multiview subspace clustering. In IEEE CVPR, pp. 923–931. Cited by: §IIA.
 (2014) Robust multiview spectral clustering via lowrank and sparse decomposition.. In AAAI, pp. 2149–2155. Cited by: TABLE I, TABLE II, TABLE III, TABLE IV, §IVA2.
 (2020) Adaptive latent similarity learning for multiview clustering. Neural Networks 121, pp. 409–418. Cited by: §I.
 (2016) Unsupervised deep embedding for clustering analysis. In ICML, pp. 478–487. Cited by: §IIIC4.
 (2013) A survey on multiview learning. arXiv preprint arXiv:1304.5634. Cited by: §I, §IIA.
 (2015) Multiview clustering via pairwise sparse subspace representation. Neurocomputing 156, pp. 12–21. Cited by: §IIA.
 (2015) Incomplete multiview clustering via subspace learning. In ACM ICIKM, pp. 383–392. Cited by: §I, §I.
 (2018) Graph structure fusion for multiview clustering. IEEE TKDE 31 (10), pp. 1984–1993. Cited by: §I.
 (2020) Generalized latent multiview subspace clustering. IEEE TPAMI 42 (1), pp. 86–99. Cited by: §I.
 (2019) CPMnets: cross partial multiview networks. In Advances in Neural Information Processing Systems, pp. 557–567. Cited by: §I.
 (2017) Stackgan: text to photorealistic image synthesis with stacked generative adversarial networks. In ICCV, pp. 5907–5915. Cited by: §IIC.
 (2019) Image deraining using a conditional generative adversarial network. IEEE transactions on circuits and systems for video technology. Cited by: §I.
 (2017) Multiview image generation from a singleview. arXiv preprint arXiv:1704.04886. Cited by: §IIC.
 (2017) Multiview clustering via deep matrix factorization.. In AAAI, pp. 2921–2927. Cited by: §IIA.
 (2016) Incomplete multimodal visual data grouping.. In IJCAI, pp. 2392–2398. Cited by: §I, §IIB.
 (2014) Partial multiview clustering. In AAAI, Cited by: §I, §I, §IIB, §IVA2.
 (2017) Unpaired imagetoimage translation using cycleconsistent adversarial networks. arXiv preprint arXiv:1703.10593. Cited by: §I, §IIC.