MultiView MultiInstance MultiLabel Learning based on Collaborative Matrix Factorization^{†}^{†}thanks: Corresponding author, gxyu@swu.edu.cn (Guoxian Yu).
Abstract
Multiview Multiinstance Multilabel Learning(M3L) deals with complex objects encompassing diverse instances, represented with different feature views, and annotated with multiple labels. Existing M3L solutions only partially explore the inter or intra relations between objects (or bags), instances, and labels, which can convey important contextual information for M3L. As such, they may have a compromised performance.
In this paper, we propose a collaborative matrix factorization based solution called M3Lcmf. M3Lcmf first uses a heterogeneous network composed of nodes of bags, instances, and labels, to encode different types of relations via multiple relational data matrices. To preserve the intrinsic structure of the data matrices, M3Lcmf collaboratively factorizes them into lowrank matrices, explores the latent relationships between bags, instances, and labels, and selectively merges the data matrices. An aggregation scheme is further introduced to aggregate the instancelevel labels into baglevel and to guide the factorization. An empirical study on benchmark datasets show that M3Lcmf outperforms other related competitive solutions both in the instancelevel and baglevel prediction.
MultiView MultiInstance MultiLabel Learning based on Collaborative Matrix Factorization^{†}^{†}thanks: Corresponding author, gxyu@swu.edu.cn (Guoxian Yu).
Yuying Xing, Guoxian Yu, Carlotta Domeniconi, Jun Wang, Zili Zhang and Maozu Guo College of Computer and Information Science, Southwest University, Chongqing, China Hubei Key Laboratory of Intelligent GeoInformation Processing, China University of Geosciences, Wuhan, China Department of Computer Science, George Mason University, Fairfax, USA School of Information Technology, Deakin University, Geelong, Australia School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing, China {yyxing4148, gxyu, kingjun, zhangzl}@swu.edu.cn, carlotta@cs.gmu.edu, guomaozu@bucea.edu.cn
Copyright © 2019, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
Introduction
MultiInstance MultiLabel learning (MIML) is a framework for modeling complex objects, in which each object (or bag) contains one or more instances and is annotated by several semantic labels (?). Let’s consider bags (), where each bag encompasses instances, and is the feature vector of the th instance of the th bag. The bags and the instances are annotated with distinct labels. is the dimensional label vector for the th bag. Given a training dataset , MIML aims at learning an instancelevel (or baglevel) predictor, which maps the input features of instances (or bags) onto the label space.
Most MIML algorithms focus on single view data, where instances of bags are represented by one set of features. However, in realworld applications, a multiinstance multilabel object can often be represented via different views (?; ?). For example, as shown in Figure 1, three exemplar bags encompassing diverse instances are represented with heterogenous feature views. Since there are multitype relations between bags and between instances, learning from multiview bags is more difficult and challenging than the recently heavily studied MIML task (?; ?).
Several Multiview Multiinstance Multilabel Learning (M3L) approaches have been proposed to tackle this challenge (?; ?; ?; ?).
Relations  
bagbag  instanceinstance  labellabel  baginstance  baglabel  instancelabel  
M3LDA(?)  ✓  ✓  ✓  ✓  
MIMLmix(?)  ✓  ✓  ✓  ✓  
M3DN(?)  ✓  ✓  ✓  ✓  
MIL(?)  ✓  ✓  ✓  
MIMLSVM(?)  ✓  ✓  ✓  
MIMLfast(?)  ✓  ✓  ✓  
MIMLRBF(?)  ✓  ✓  ✓  
Proposed M3Lcmf  ✓  ✓  ✓  ✓  ✓  ✓ 
? (?) pioneered an approach called M3LDA, which employs Latent Dirichlet Allocation (?) to explore the visuallabel topics from the visual view and the textlabel topics from the text view, and then enforces the predicted labels from the two respective views to be consistent. ? (?) introduced another M3L approach, called MIMLmix, to leverage multiple views using a hierarchical Bayesian network and variational inference. MIMLmix can handle samples which are absent in some views. ? (?) developed a multiview multiinstance learning algorithm (MIL), which generates different graphs with different parameters to represent various contextual relations between instances of a bag. It then integrates these graphs into a unified framework for bag classification based on sparse representation and multiview dictionary learning. ? (?) introduced a deep neural network based approach called M3DN. M3DN separately applies a deep network for each view, and requires the bagbased predictions from different views to be consistent within the same bag. In addition, M3DN adopts the Optimal Transport theory (?) to capture the geometric information of the underlying label space and to quantify the quality of predictions.
However, these M3L approaches, like MIML solutions, only consider limited types of relations between bags or between instances, as summarized in Table 1. M3L approaches generally capture the relations between bags and instances, and the associations between bags and labels. Some approaches additionally exploit the relations between bags (?), between instances (?), and the correlations between labels (?; ?). Furthermore, other approaches use the associations between instances and labels (?; ?) to learn labels of bags at the instance level. All these types of relations simultaneously exist in M3L, however, none of the existing solutions explicitly accounts for all these relations.
To take advantage of multiple feature views of instances (or bags), an intuitive solution is to concatenate features from different views into a long vector, and then to apply MIML algorithms on the concatenated vector. However, this concatenation causes overfitting on a small number of training samples, and ignores the specific statistical property of each view (?). Ensemble learning can also work on multiview data and MIML classifiers are readily available for each view. But the base classifiers are separately trained on individual views; as such, they may have a low performance given the insufficient information of each view and the neglect of complementary information across views. Subspace learningbased approaches (?; ?) aim at obtaining a latent subspace shared by multiple views under the assumption that the input views are generated from a latent subspace. Latent subspacebased solutions may alleviate the issue of the “curse of dimensionality”, but may neglect the intrinsic structure of individual views. For multiview data, the intrinsic structures of bags and instances may be different across views. Therefore, a competent M3L approach should account for multiple types of relations between bags, instances and labels, and the intrinsic structures of different feature views.
In this paper, we introduce an approach called M3Lcmf. M3Lcmf first constructs a heterogeneous network composed of nodes of bags, instances, and labels, to capture the intrarelations between nodes of the same type, interrelations between bags and instances, between bags and labels, and between instances and labels. To respect and employ the intrinsic structure of the subnetworks of the intra and interrelations, it collaboratively factorizes the association matrices of the subnetworks into lowrank matrices to pursue the lowrank representation of the nodes and the latent relationships among them, and also to selectively integrate multiple feature views of bags and instances. M3Lcmf additionally introduces an aggregation term into the factorization objective, which not only can aggregate the instancelabel associations into baglevel, but also can reversely guide the prediction of these associations. The main contributions of this work are summarized as follows:

Unlike existing solutions that can only account for several types of relations between bags and instances, M3Lcmf can simultaneously take into account multiple types of relations between bags, instances, and labels.

Our proposed M3Lcmf can selectively combine multiple feature views of bags and instances, preserve multiple intrinsic intra and interrelations without mapping interrelations into the homologous network of bags or instances. It can make predictions at the instancelevel and automatically aggregate the predictions to the baglevel.

Experimental results on benchmark datasets show that M3Lcmf performs favorably against the recently proposed M3L approaches MIMLmix (?) and MIL (?), and other representative MIML methods (including MIMLSVM (?), MIMLNN (?), MIMLRBF (?) and MIMLfast (?)). M3Lcmf is also robust to a wide range of input parameters.
The Proposed Method
Problem Formulation
Without loss of generality, we assume instances (or bags) have feature views, , where is the feature space of instances in the th view. is the dimensional label space for the th bag across all the views. The task of M3L is to learn a predictive function , which maps multiple input feature views onto the label space.
To address this task, we first construct a heterogeneous network to encode multiple types of relations between bags, instances, and labels. Next, we collaboratively factorize the relational data matrices of the heterogeneous network into lowrank matrices, and predict the instancelabel association based on the respective lowrank matrices; we then aggregate the instancelevel predictions onto baglevel. The following two subsections elaborate on the network construction and collaborative matrix factorization.
Heterogeneous Network Construction
As shown in Figure 1, there are three types of nodes in the heterogeneous network: bags, instances, and labels. Each type of nodes has a different intrinsic structure. Bags and instances can have multiple heterogeneous feature views, which often provide complementary information. We first construct a heterogeneous network to represent intrinsic structures between nodes of multiple information sources.
It is recognized that relations among instances in a bag convey important contextual information in multiinstance learning, and they influence the overall performance (?). To explore the intrinsic structure of instances, we construct a subnetwork of instances for each feature view. For simplicity, we measure the relation between and in the th view using the Gaussian heat kernel , where is the average Euclidean distance between all the instances of the th view.
In M3L, a bag contains one or more instances and has its own characteristics, which are different from those of instances. Here, we construct a bag subnetwork to capture the contextual information of bags based on a composite Hausdorff distance for each view as follows:
where , is the Euclidean distance between two instances ( and ). Then, we define as the similarity between the th bag and th bag in the th view, and is set to the average composite Hausdorff distance between all the bags of this view. These three types of Hausdorff distances are widely used in MIML(?). Different Hausdorff distances have different focuses. The minimal Hausdorff distance indicates the minimal distance between all instances of one bag and those of another bag; the maximal Hausdorff distance computes the maximum distance between instances of a bag and the nearest instances of another bag; while the average Hausdorff distance takes into account more geometric relations between instances of two bags (?). This composite similarity can integrate the merits of the Hausdorff distance metrics.
In M3L, each bag is simultaneously annotated with several semantic labels, and the labels are not mutually exclusive. Different pairs of labels may have different degrees of correlation. Label correlation can be leveraged to boost the performance multilabel learning (?). To quantify label correlations, we adopt the widely used cosine similarity to construct a subnetwork of labels. Since instances and bags share the same label space, only one label subnetwork is constructed. Let store the distribution of label across all the bags. The correlation between two labels and can be empirically estimated as follows:
(1) 
The specific distance metrics used to construct the three types of intrarelations in the subnetworks have been chosen for their simplicity and wide applicability. Other distance metrics can be used as well.
There are three types of interrelations between bags, instances, and labels. The baginstance interrelational data matrix can be specified based on the known baginstance associations, which are readily available in multiinstance data. The baglabel relational matrix can be directly specified based on the known labels of bags. For the instancelabel relational data matrix , since the initial labels of instances are generally unknown in multiinstance learning, we initially set . If the labels of instances are partially known, we can also specify based on the known labels of instances.
By referring to Table 1, we can say that the heterogeneous network can account for all types of relations between bags, instances, and labels.
Collaborative Matrix Factorization
To combine multiple intrarelational data matrices and , we can project all the data matrices onto a composite instanceinstance intrarelational data matrix, or onto a composite bagbag intrarelational data matrix, and then make prediction on the composite relational data matrix. This projection idea has been used to integrate multiple interconnected subnetworks (?). However, this projection may enshroud the intrinsic structures of different relational data matrices and compromise the performance. ? (?) recently introduced a data fusion framework (DFMF) based on matrix factorization. This framework does not need to map a heterogeneous network into a small homologous network, and it can leverage and preserve the intrinsic structures of multiple relational data matrices. The objective function of this framework is as follows:
(2)  
where is the Frobenius norm. stores the interrelation between the th object and the th object. , , , where is the low rank representation of the th object type, and is the number of object types. Suppose the th type of objects has data sources, represented by constraint matrices . , which collectively stores all the block diagonal matrices.
Based on the constructed heterogeneous network, and for the nonnegativity of the inter and intrarelational data matrices, we extend Eq. (2) and define the objective function of M3Lcmf as follows:
(3)  
where , , and are the low rank representations of multiple bags, instances, and labels, respectively. M3Lcmf has two prediction objectives. The first one is to predict instancelabel associations by approximating it to . The other objective is to predict labels of bags by approximating to . Instead of approximating by , we add an aggregation term into Eq. (3) to aggregate label information of instances to their originating bags. is a diagonal matrix, and . This aggregation term is also driven by the multiinstance learning principle that the labels of a bag depend on the labels of its instances. Note, this aggregation term can reversely guide the pursue of and . As such, the labels of instance can also be learnt from those of bags. The last term is the manifold regularization (?) on .
The intrarelations between bags, instances, and labels carry important contextual information, whose usage can improve the overall performance. Since , , and can be viewed as the latent lowdimensional representation of bags, instances, and labels, we follow the idea of manifold regularization to enforce two data points with a high intraassociation value being nearby in the lowdimensional space, and formulate the last term in Eq. (3) as below to use three types of intraassociations:
(4)  
where and are two parameters to balance the importance of the th bag view and th instance view, respectively. and are two series of diagonal matrices, with each diagonal entry equal to the row sum of and , respectively; follows a similar definition. can be viewed as the smoothness loss on the th bag view. and are introduced to avoid selecting single view alone. If these two parameters are excluded, only and with the smallest loss will be selected. Our empirical study shows that and can indeed selectively integrate different views and reduce the impact of noisy views by assigning smaller or zero weights to them. We can see that DFMF equally treats all the relational matrices , it does not differentiate the different degrees of relevance of and toward the prediction task. Unlike DFMF, which simply reverses the sign of to fulfil in Eq. (2), M3Lcmf uses the graph Laplacian matrix to guide the approximation, and has a good geometric explanation.
From the above analysis, we can conclude that M3Lcmf can predict labels for complicated objects both at instancelevel and baglevel, and can simultaneously preserve multitype relations between bags and instances. Besides the aggregation term, another distinction between M3Lcmf and DFMF is that the former can selectively combine multiple intrarelational data matrices, whereas the latter equally treats all the relational data matrices. As such, M3Lcmf can reduce the impact of noisy (or irrelevant) intrarelational data matrices for the target prediction task.
Following the idea of standard nonnegative matrix factorization (?) and Alternating Direction Method of Multipliers (ADMM), we alternatively optimizes one variable of , , , and one time with other variables fixed. Due to page limit, the optimization procedures of these variables are provided in the Supplementary file.
We then use the optimized and to approximate (instancelabel association matrix) as follows:
(5) 
To further map the labels of instances onto the corresponding bag, we approximate the baglabel association matrix as follows:
(6) 
As such, M3Lcmf can make label prediction both at the instance and bag levels.
Experiments
Experimental Setup
We perform three experiments to investigate the performance of the proposed M3Lcmf. In the first experiment, six representative and related approaches, including four MIML methods (MIMLSVM (?), MIMLRBF (?), MIMLNN (?), and MIMLfast (?)) and two M3L methods (MIMLmix (?) and MIL(?)) are compared against M3Lcmf on both the baglevel and instancelevel prediction. In the second experiment, four variants of M3Lcmf are designed to quantify the contribution of different types of relations. The third experiment studies the parameter sensitivity of M3Lcmf.
Nine publicly available multiinstance multilabel datasets from different domains are used for the experiments. The details of the datasets are given in Table 2. The first five datasets are collected from http://lamda.nju.edu.cn/CH.Data.ashx and http://github.com/hsoleimani/MLTM/tree/master/Data. They only have the baglevel labels and are used for evaluating the baglevel predictions. The original Delicious dataset includes 12234 bags with 223285 instances; to avoid an excessively heavy computational load, we randomly selected 1000 bags with 17613 instances from Delicious for the experiments. The last four datasets have instancelevel labels (?; ?), they are used for instancelevel prediction and evaluation (?; ?).
Dataset  bag  instance  label  avgBI  avgBL 

Haloarcula_marismortui  304  951  234  3.1  3.2 
Geobacter_sulfurreducens  379  1214  320  3.2  3.1 
Azotobacter_vinelandii  407  1251  340  3.1  4.0 
Pyrococcus_furiosus  425  1321  321  3.1  4.5 
Delicious  1000  17613  20  17.6  2.8 
Letter Frost  144  565  26  3.9  3.6 
Letter Carroll  166  717  26  4.3  3.9 
MSRC v2  591  1758  23  1.0  2.5 
Birds  548  10232  13  18.7  2.1 
To evaluate the effectiveness of M3Lcmf, four widelyused multilabel evaluation metrics are adopted, including Ranking Loss (RankLoss), macro AUC (Area Under receiver operating Curve) (macroAUC), Average Recall (AvgRecall), and Average F1score (AvgF1). Due to space limitation, the formal definition of these metrics is omitted here but can be found in (?; ?). The smaller the values of RankLoss, the better the performance is. As such, to be consistent with the other evaluation metrics, we report 1RankLoss instead. For the latter metrics, larger values are an indication of a better performance.
Prediction Results at the BagLevel
Metric  MIMLNN  MIMLRBF  MIMLSVM  MIMLfast  MIMLmix  MIL  M3Lcmf 
Haloarcula_marismortui  
1RankLoss  
macroAUC  
AvgRecall  
AvgF1  
Azotobacter_vinelandii  
1RankLoss  
macroAUC  
AvgRecall  
AvgF1  
Geobacter_sulfurreducens  
1RankLoss  
macroAUC  
AvgRecall  
AvgF1  
Pyrococcus_furiosus  
1RankLoss  
macroAUC  
AvgRecall  
AvgF1  
Delicious  
1RankLoss  
macroAUC  
AvgRecall  
AvgF1 
We randomly partition the samples of each dataset into a training set (70%) and a testing set (30%), and independently run each algorithm in each partition. We report the average results (10 random partitions) and standard deviations in Table 3. Since there are no offtheshelf multiview datasets for multiinstance multilabel learning, for MIMLmix (?), MIL(?) and the proposed M3Lcmf, we divide the original features of each bag into two views by randomly selecting half features for one view, and the remaining features for the other view. We initialize when the th bag encompasses the th instance; otherwise. We set when the th bag is annotated with the th label; otherwise. Both and are fixed to 1000, and the lowrank size of () is fixed to 140. The input parameters of these comparing methods are specified (or optimized) as suggested by the authors in their code or papers, and the setting of the parameters for M3Lcmf will be investigated later.
M3Lcmf generally outperforms these comparing methods across different datasets and the used metrics. We further used the signedrank test (?) to check the significance between M3Lcmf and these methods (except MIMLRBF). All the values are small than 0.02, and the value between M3Lcmf and MIMLRBF is 0.13. MIMLmix did not complete the computation on the Delicious dataset over the period of two weeks. As a result, we could not report the results of MIMLmix on this dataset. M3Lcmf, MIMLmix, and MIL are M3L methods, and M3Lcmf frequently outperforms the latter two, which only use limited types of relations between objects. This fact shows the importance of accounting for multitype relations in M3L. M3Lcmf has a lower 1RankLoss but a higher AvgRecall and AvgF1 than MIMLmix, the possible reason is that MIMLmix captures label correlations by assuming the labels being sampled from Multinomial distribution and it samples a label indicator for each instance, whereas M3Lcmf simply uses the cosine similarity to measure the correlation. M3Lcmf outperforms three MIML solutions (MIMLNN, MIMLfast and MIMLSVM), which utilize much fewer relations between bags, instances and labels than M3Lcmf does. This comparison again corroborates the advantage of leveraging multiple types of relations in M3L, and also suggests the importance of integrating multiple data views. Although MIMLRBF considers limited types of relations between bags and instances, it still obtains a comparable performance with M3Lcmf. The possible cause is that MIMLRBF additionally uses the RBF neural network to learn an enhanced feature representation and a nonlinear classifier.
Prediction Results at the InstanceLevel
Metric  MIMLfast  MIMLmix  M3Lcmf 

Letter Frost  
1RankLoss  
AvgF1  
Letter Carroll  
1RankLoss  
AvgF1  
MSRC v2  
1RankLoss  
AvgF1  
Birds  
1RankLoss  
AvgF1 
To investigate the performance of M3Lcmf at the instancelevel, we conduct experiments on the last four datasets with instancelevel labels in Table 4. MIMLfast, MIMLmix and the proposed M3Lcmf are tested on these datasets under the same experimental protocol at the baglevel. The result values of 1RankLoss and AvgF1 are reported in Table 4.
M3Lcmf outperforms these comparing methods on different datasets in most cases, and it loses to MIMLmix on the Birds dataset. Among these three comparing methods, MIMLmix often ranks the 2nd place and MIMLfast the 3rd place. MIMLmix does not make use of bagbag relation and instanceinstance relation as summarized in Table 1. MIMLfast additionally does not make use of instancelabel relation, so it loses to MIMLmix, and say nothing of M3Lcmf, which utilizes all six types of relations. These comparisons again prove the effectiveness of leveraging multitype relations in M3L. In summary, M3Lcmf can not only accurately predict labels of bags, but also labels of instances.
Contribution of Different Types of Relations
To further analyze the contribution of different relations used by M3Lcmf, we introduce four variants. (i) M3Lcmf (nR11) does not consider the relation between bags, i.e., ; (ii) M3Lcmf (nR22) does not consider the relation between instances, i.e., ; (iii) M3Lcmf (nR33) does not consider the relation between labels, i.e., ; (iv) M3Lcmf (nR23) does not consider the relation between instances and labels, i.e., , instead of . We follow the experimental protocol at the baglevel prediction, and report the results of 1RankLoss obtained by M3Lcmf and its variants in Fig. 2.
M3Lcmf significantly outperforms its variants, which separately disregard one type of relations. M3Lcmf often outperforms M3Lcmf (nR11) and M3Lcmf (nR22). This observation suggests the relation between bags and that between instances have an important effect on M3Lcmf. Besides, M3Lcmf(nR33) is outperformed by all the other variants, which shows the importance of considering the label correlation. In addition, we can observe that M3Lcmf (nR23) is outperformed by M3Lcmf. This observation not only proves the effectiveness of the introduced aggregation term, but also shows the importance of instancelabel relations in boosting the prediction performance.
From these results, we can conclude that multiple types of relations between bags, instances, and labels should be simultaneously considered in M3L.
Parameter Sensitivity
Three parameters (, , and the lowrank size of ) may affect the performance of M3Lcmf. We conduct additional experiments to investigate the sensitivity of these parameters. For brevity, we only report the results on Azotobacter vinelandii and MSRC v2, and the results on the other datasets lead to similar conclusions.
From the explicit solution for and in the supplementary file, it is clear that once the values and are specified, the weights assigned to and can be computed based on the reconstruction loss of those matrices. To investigate the sensitivity of these two parameters, we vary and in the range , and report the average 1RankLoss of M3Lcmf under different combinations of them in Fig. 3. We can see that M3Lcmf achieves a stable performance under a wide range of combinations of values for and . For Azotobacter vinelandii, M3Lcmf achieves a good performance with and in , and it shows a significantly reduced 1RankLoss when either or are set to a too small value. This is because the predictions are made and evaluated at the baglevel and the baglevel intrarelation plays a more important role, but only one baglevel intra relational data matrix is selected under this setting. Unlike the pattern on Azotobacter vinelandii, M3Lcmf holds a relatively stable performance on MSRC v2 under different combinations of values for and . This is because Azotobacter vinelandii provides more structural information and feature information for the intrarelational data matrices of bags (or instances) than MSRC v2. Particularly, the former has more instances per bag than the latter, and the bag in MSRC v2 generally has one instance. Besides, the feature dimensionality of instances in Azotobacter vinelandii is much larger than that of MSRC v2. This investigation suggests the importance of structural information of bags (or instances) in M3L. From these results, we can conclude that an effective combination of and can be easily found.
The lowrank size of is an essential parameter for M3Lcmf. Fig. 4 shows the results of M3Lcmf under different input values of on Azotobacter vinelandii and MSRC v2 with and . We observe an increasing trend of 1RankLoss, and an overall good performance when or . M3Lcmf does not show a high 1RankLoss when a small is adopted, that is because a too small can not sufficiently encode the latent feature information of bags, instances, and labels. However, we can still find that an effective input value can be easily selected.
Contributions of Weighting IntraRelational Data
To investigate the contribution of weighting intrarelational data and the capability of M3Lcmf on discarding noisy intrarelational data matrices, we added 10 synthetic noisy intrarelational data matrices of bags on the Azotobacter vinelandii dataset. Particularly, the 10 noisy data matrices are obtained by randomly shuffling the nonzero entries of each row of two valid matrices, which are constructed in the same way as in the first type of experiments. For reference, we also applied MIMLNN on the same dataset with the same 10 noisy data matrices, and reported the results in Fig. 5(a).
Even with 10 noisy data matrices, M3Lcmf does not show a decreased performance, but MIMLNN shows a clearly reduced performance (by 2%). That is because M3Lcmf explicitly considers the different relevances of intrarelational data matrices, and it can selectively integrate these matrices. In contrast, MIMLNN does not account for the different relevances of these matrices. As a result, it is more impacted by these noisy matrices.
To further investigate the underlying reason for the robust performance of M3Lcmf, we plot weights assigned to these 12 (2 valid and 10 noisy) intrarelational data matrices of bags in Fig. 5(b). We can see that these 10 noisy data matrices are assigned with zero weights. Namely, M3Lcmf discards these noisy data matrices during the collaborative matrix factorization process. This investigation justifies our motivation to account for different relevances of multiple intrarelation data matrices.
Conclusion
In this paper, we proposed a collaborative matrix factorization based multiview multiinstance multilabel learning approach called M3Lcmf. M3Lcmf utilizes a heterogeneous network to capture different types of relations in M3L, and collaboratively factorizes the relational data matrices of the network to explore the intrinsic relations between bags, instances, and labels. Extensive experimental results on different datasets corroborate our hypothesis that multiple types of relations can boost the performance of M3L, and their joint usage contributes to a significantly improved performance of M3Lcmf against competitive approaches. The Supplementary file and code of M3Lcmf are available at http://mlda.swu.edu.cn/codes.php?name=M3Lcmf.
Acknowledgments
The authors appreciate the reviewers for their helpful comments on improving our work. This work is supported by NSFC (61872300, 61741217, 61873214 and 61871020), NSF of CQ CSTC (cstc2018jcyjAX0228, cstc2016jcyjA0351 and CSTC2016SHMSZX0824), the Open Research Project of Hubei Key Laboratory of Intelligent GeoInformation Processing (KLIGIP2017A05), the National Science and Technology Support Program (2015BAK41B03 and 2015BAK41B04), and Fundamental Research Funds for the Central Universities of China (XDJK2019D019 and XDJK2019B024).
References
 [Belkin, Niyogi, and Sindhwani 2006] Belkin, M.; Niyogi, P.; and Sindhwani, V. 2006. Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. JMLR 7(11):2399–2434.
 [Blei, Ng, and Jordan 2003] Blei, D. M.; Ng, A. Y.; and Jordan, M. I. 2003. Latent dirichlet allocation. Journal of Machine Learning Research 3:993–1022.
 [Briggs, Fern, and Raich 2012] Briggs, F.; Fern, X. Z.; and Raich, R. 2012. Rankloss support instance machines for miml instance annotation. In KDD, 534–542.
 [Chen et al. 2018] Chen, X.; Yu, G.; Domeniconi, C.; Wang, J.; Li, Z.; and Zhang, Z. 2018. Cost effective multilabel active learning via querying subexamples. In ICDM, 1–6.
 [Demšar 2006] Demšar, J. 2006. Statistical comparisons of classifiers over multiple data sets. JMLR 7(1):1–30.
 [Feng and Zhou 2017] Feng, J., and Zhou, Z.H. 2017. Deep miml network. In AAAI, 1884–1890.
 [Gibaja and Ventura 2015] Gibaja, E., and Ventura, S. 2015. A tutorial on multilabel learning. ACM Computing Surveys 47(3):52.
 [Gligorijević and Pržulj 2015] Gligorijević, V., and Pržulj, N. 2015. Methods for biological data integration: perspectives and challenges. Journal of the Royal Society Interface 12(112):20150571.
 [He et al. 2016] He, J.; Du, C.; Zhuang, F.; Yin, X.; He, Q.; and Long, G. 2016. Online bayesian maxmargin subspace multiview learning. In IJCAI, 1555–1561.
 [Huang, Gao, and Chen 2017] Huang, S.J.; Gao, N.; and Chen, S. 2017. Multiinstance multilabel active learning. In IJCAI, 1886–1892.
 [Huang, Gao, and Zhou 2018] Huang, S.J.; Gao, W.; and Zhou, Z.H. 2018. Fast multiinstance multilabel learning. TPAMI 99(1):1–14.
 [Lee and Seung 2001] Lee, D. D., and Seung, H. S. 2001. Algorithms for nonnegative matrix factorization. In NIPS, 556–562.
 [Li et al. 2017] Li, B.; Yuan, C.; Xiong, W.; Hu, W.; Peng, H.; Ding, X.; and Maybank, S. 2017. Multiview multiinstance learning based on joint sparse representation and multiview dictionary learning. TPAMI 39(12):2554–2560.
 [Nguyen et al. 2014] Nguyen, C. T.; Wang, X.; Liu, J.; and Zhou, Z. H. 2014. Labeling complicated objects: multiview multiinstance multilabel learning. In AAAI, 2013–2019.
 [Nguyen, Zhan, and Zhou 2013] Nguyen, C. T.; Zhan, D. C.; and Zhou, Z. H. 2013. Multimodal image annotation with multiinstance multilabel lda. In IJCAI, 1558–1564.
 [Shao et al. 2016] Shao, W.; Zhang, J.; He, L.; and Philip, S. Y. 2016. Multisource multiview clustering via discrepancy penalty. In IJCNN, 2714–2721.
 [Tan et al. 2018] Tan, Q.; Yu, G.; Domeniconi, C.; Wang, J.; and Zhang, Z. 2018. Incomplete multiview weaklabel learning. In IJCAI, 2703–2709.
 [Villani 2008] Villani, C. 2008. Optimal transport: old and new, volume 338. Springer Science & Business Media.
 [Winn, Criminisi, and Minka 2005] Winn, J.; Criminisi, A.; and Minka, T. 2005. Object categorization by learned universal visual dictionary. In ICCV, 1800–1807.
 [Xu, Tao, and Xu 2013] Xu, C.; Tao, D.; and Xu, C. 2013. A survey on multiview learning. arXiv preprint arXiv:1304.5634.
 [Yang et al. 2018] Yang, Y.; Wu, Y.F.; Zhan, D.C.; Liu, Z.B.; and Jiang, Y. 2018. Complex object classification: A multimodal multiinstance multilabel deep network with optimal transport. In KDD, 2594–2603.
 [Zhang and Wang 2009] Zhang, M. L., and Wang, Z. J. 2009. Mimlrbf: Rbf neural networks for multiinstance multilabel learning. Neurocomputing 72(1618):3951–3956.
 [Zhang and Zhou 2009] Zhang, M.L., and Zhou, Z.H. 2009. Multiinstance clustering with applications to multiinstance prediction. Applied Intelligence 31(1):47–68.
 [Zhang and Zhou 2014] Zhang, M., and Zhou, Z. 2014. A review on multilabel learning algorithms. TKDE 26(8):1819–1837.
 [Zhou et al. 2008] Zhou, Z. H.; Zhang, M. L.; Huang, S. J.; and Li, Y. F. 2008. Miml: A framework for learning with ambiguous objects. Corr Abs 2012.
 [Zhou et al. 2012] Zhou, Z.H.; Zhang, M.L.; Huang, S.J.; and Li, Y.F. 2012. Multiinstance multilabel learning. Artificial Intelligence 176(1):2291–2320.
 [Zhu, Ting, and Zhou 2017] Zhu, Y.; Ting, K. M.; and Zhou, Z.H. 2017. Discover multiple novel labels in multiinstance multilabel learning. In AAAI, 2977–2984.
 [Zitnik and Zupan 2015] Zitnik, M., and Zupan, B. 2015. Data fusion by matrix factorization. TPAMI 37(1):41–53.