Weighting Scheme for a Pairwise Multilabel Classifier Based on the Fuzzy Confusion Matrix.
Abstract
In this work we addressed the issue of applying a stochastic classifier and a local, fuzzy confusion matrix under the framework of multilabel classification. We proposed a novel solution to the problem of correcting label pairwise ensembles. The main step of the correction procedure is to compute classifierspecific competence and crosscompetence measures, which estimates error pattern of the underlying classifier. At the fusion phase we employed two weighting approaches based on information theory. The classifier weights promote base classifiers which are the most susceptible to the correction based on the fuzzy confusion matrix. During the experimental study, the proposed approach was compared against two reference methods. The comparison was made in terms of six different quality criteria. The conducted experiments reveals that the proposed approach eliminates one of main drawbacks of the original FCMbased approach i.e. the original approach is vulnerable to the imbalanced class/label distribution. What is more, the obtained results shows that the introduced method achieves satisfying classification quality under all considered quality criteria. Additionally, the impact of fluctuations of data set characteristics is reduced.
keywords:
multilabel classification, label pairwise transformation, random reference classifier, confusion matrix, information theroy, entropy1 Introduction
Many realworld datasets describe objects that are assigned to multiple categories at the same time. All of these concepts constitutes a full description of the object and omitting one of these tags induces a loss of information. Classification process in which such kind of data is involved is called multilabel classification Gibaja2014 ().
Unfortunately, singlelabel classification methods are not able to solve aforementioned task directly. The main reason is that singlelabel classifiers are built under the assumption that an object is assigned to only one class. A solution to this issue is to provide dedicated multilabel classification procedures that are able to handle multilabel data directly.
This study is conducted with the aim of assessing the results of the application of an informationtheorybased competence measure in the task of improving the classification quality obtained by labelpairwise (LPW) multilabel classifiers. Especially, the focus is put on investigating the impact of the aforementioned quality criterion on a classifier that is corrected using a procedure based on fuzzy confusion matrix and Random Reference classifier (RRC). The procedure corrects predictions of the classifiers constituting the LPW ensemble. The outcome of each of LPW members is individually modified according to the confusion pattern obtained during the validation stage. And then, they are combined using combination method driven by informationtheoretic competence measure.
This paper is organized as follows. The next section shows the work related to the issue which is considered in this paper. Section 3 provides a formal notation used throughout this article, and introduces the FCM correction algorithm and its weighted version. Section 4 contains a description of experimental setup. In section 5 the experimental results are presented and discussed. Section 6 concludes the paper.
2 Related Work
Multilabel classification algorithms can be broadly partitioned into two main groups i.e. set transformation algorithms and algorithm adaptation approaches Gibaja2014 ().
A method that belongs to the group of algorithm adaptation approaches provides a generalisation of an existing multiclass algorithm. The generalised algorithm is able to solve multilabel classification problem in a direct way. Among the others, the most known approaches from this group are: multi label KNN algorithm Jiang2012 (), the ML Hoeffding trees Read2012 (), the Structured SVM approach Diez2014 () or deeplearningbased algorithms Wei2015 ().
On the other hand, methods from the former group decomposes a multilabel problem into a set of singlelabel classification tasks. During the inference phase output of the underlying singlelabel classifiers are combined in order to create a multilabel prediction.
The technique of decomposition of multilabel classification task into a set of binary classifiers, which is studied in depth throughout this article, is the labelpairwise (LPW) scheme. Under this framework, each pair of labels is assigned with a binary classifier. The outcome of the classifier is interpreted as an expression of pairwise preference in a label ranking Hllermeier2010 ().
The concept of the fuzzy confusion matrix (FCM) was first introduced in studies related to the task of hand gestures recognition Kurzynski2015 (); Trajdos2016 (). The proposed system uses two main advantages of FCM approach. That is, its ability to correct output of a classifier that makes systematic errors. The other is a possibility of handling imprecise class assignment.
Abovementioned approach was also employed under multilabel classification framework Trajdos2015 (). Namely, it was used to improve the quality of Binary Relevance classifiers. Experiments confirmed the validity of its use, but also showed sensitivity to the unbalanced class distribution in a binary problem. In this study, we are focused on addressing this issue via employing LPW technique which produces more balanced singlelabel problems than BR approach.
During the prediction phase, we decided to employ a weight function based on information theory. The main motivation was that the informationtheoretic measures holds a few properties which makes them very reliable indicators of competence of a FCMcorrected classifier. To be more precise, the previously conducted research showed that although the FCM model is able to correct a randomly guessing classifier, the correction is most efective when the underlying base classifier makes a systematic error Trajdos2016 (). The informationtheoretic competence criterion allow us to detect such situation, and put more weight to classifiers which correction ability is higher.
3 Proposed method
3.1 Preliminaries
Under the Multilabel formalism a object is assigned to a set of labels indicated by a binary vector of length : , where denotes the number of labels. Each element of the vector is related to a single label. In this study we suppose, that multilabel classifier is built in a supervised learning procedure using the training set containing pairs of feature vectors and corresponding label vectors .
Additionally, throughout this paper we follow the statistical classification framework, so vectors and are treated as realisations of random vectors and , respectively.
3.2 Pairwise Transformation
The labelpairwise (LPW) transformation, builds the multilabel classifier using an ensemble of binary classifiers and a single binary classifier is assigned to each pair of labels:
(1) 
During the training phase of a binary classifier only learning objects belonging either to th or th label are used. Examples that appear in both classes are ignored. Instances assigned to other labels are also ignored because they hold no information that can be used by the binary classifier Hllermeier2010 ().
During the inference stage, at the continuousvalued output level, classifier produces a 2dimensional vector of label supports , which values are interpreted as the supports for the hypothesis that –th and –th labels are relevant for the object . Without loss of generality we assume that the output vector is normalised, that is: .
All binary classifiers in the LPW ensemble contribute to the final decision through combining their continuousvalued outputs. That is, the final support for th label is calculated using weighted average of soft outputs of adequate binary classifiers:
(2) 
where is a weight, which is calculated in a dynamic way for an input vector , that is assigned to a pairspecific binary classifier.
Final multilabel classification, i.e. response of a multilabel classifier is obtained as a result of thresholding procedure applied to the soft outputs of the abovedefined multilabel classifier:
(3) 
where the threshold is usually set to .
3.3 Proposed Correction Method
The proposed correction method is based on an assessment of the probability of classifying an object to the class by the binary classifier . Such an approach requires a probabilistic model which assumes that result of classification of object by binary classifier , true label and feature vector are observed values of random variables , , , respectively. Random and are a simple consequence of the probabilistic model presented in the previous subsection.
Random for a given denotes that binary classifier is a randomized classifier which is defined by the conditional probabilities Berger1985 ().
The Bayesian model allows us to define the posterior probability of label as:
(4) 
where denotes probability that an object belongs to the class given that .
Unfortunately, at the core of the proposed method, we put rather an impractical assumption that the classifier assigns a label in a stochastic manner. We dealt with this issue by harnessing deterministic binary classifiers whose statistical properties were modelled using the RRC procedure Woloszynski2011 (). The RRC model calculates the probability that the underlying classifier assigns an instance to class : .
3.4 Confusion Matrix
During the inference process of the proposed approach, the probability is estimated using local, fuzzy confusion matrix. An example of such matrix for a binary classification task is given in Table 1. The rows of the matrix corresponds to the groundtruth classes, whereas the columns match the outcome of a classifier. The fuzzy nature of the confusion matrix arises directly from the fact that a stochastic model has been employed. We expressed decision regions of the random classifier in terms of fuzzy set formalism Zadeh1965 (). To provide an accurate estimation, we have also defined our confusion matrix as local which means that the matrix is build using neighbouring points of the instance .
The local fuzzy confusion matrix is estimated using a validation set:
(5) 
where denotes description instance and corresponding vector indicating label assignment respectively. On the basis of this set we define pairwise subsets of validation set, fuzzy decision region of and set of neighbours of respectively:
(6)  
(7)  
(8) 
where each triplet defines fuzzy membership value of instance and indicates the fuzzy decision region of the stochastic classifier. Additionally, denotes the fuzzy neighbourhood of instance . The membership function of the neighbourhood was defined using Gaussian potential function.
The abovedefined fuzzy sets are employed to approximate : The following fuzzy sets are employed to approximate entries of the local confusion matrix:
(9) 
where is the cardinality of a fuzzy set Dhar2013 (). Finally, the approximation of is calculated as follows:
(10) 
estimated  

true  
3.5 Weighting Scheme
In this section, we define a weighting approach, that is used during the prediction phase, to promote base classifiers for whom the correction ability of the FCM model is most effective.
We compute mutual information () and joint entropy () of the random variables corresponding to randomized classifier prediction and true label assignment. Finally, the classifierspecific weight is defined as a normalised mutual information Cahill2010 ():
(11) 
4 Experimental Setup
The conducted experimental study provides an empirical evaluation of the classification quality of the proposed method and compares it against reference mehods. Namely, we conducted our experiments using the following algorithms:

Unmodified LPW classifier Hllermeier2010 (),

LPW classifier corrected using confusion matrix specific to balanced label distributions.

LPW classfier corrected using FCM with fusion performed using information theoretic weight.
In the following sections of this paper, we will refer to the investigated algorithms using abovesaid numbers.
All base singlelabel classifiers were implemented using the Naïve Bayes classifier Hand2001 () combined with Random Subspace technique TinKamHo1998 (). We utilized Naïve Bayes implemented in WEKA framework Hall2009 (). The classifier parameters were set to its defaults. For the Random Subspace we set the number of attributes to the of the original number of attributes, and the number of repetitions was set to . All multilabel algorithms were implemented using MULAN Tsoumakas2011_mulan () framework.
The experiments were conducted using 29 multilabel benchmark sets. The main characteristics of the datasets are summarized in Table 2.
The extraction of training and test datasets was performed using fold crossvalidation. The proportion of the training set was fixed at of the original training set . Some of the employed sets needed some preprocessing. That is, multi label regression sets (No. 9,10,28) were binarised using thresholding procedure. To be more accurate, when the value of output variable,for given object, is greater than zero, the corresponding label is set to be relevant to the object. We also used multilabel multiinstance Zhou2012 () sets (No.:2,4,5,12,13,18,20,21) which were transformed to singleinstance multilabel datasets according to the suggestion made by Zhou et al. Zhou2012 (). Two of used datasets are synthetic ones (No. 23,24) and they were generated using algorithm described in Tomas2014 (). To reduce the computational burden we use only two subsets for each of IMDB and Tmc2007 sets.
We used datasets from the sources abbreviated as follows: M–Tsoumakas2011_mulan (); W–Wu2014 (); X–Xu2013 (); Me–meka (); Z–Zhou2012 (); T–Tomas2014 ()
The algorithms were compared in terms of 6 different quality criteria coming from three groups: ranking based, instancebased, label based (including microaveraged and macroaveraged) Luaces2012 ().
Statistical evaluation of the results was performed using the Wilcoxon signedrank test demsar2006 () and the familywise error rates were controlled using the BergmannHommel’s procedure demsar2006 (). For all statistical tests, the significance level was set to .
To provide a more detailed look at the properties of the proposed approach, the relations between classification quality obtained by investigated algorithms and chosen dataset characteristics were also analysed. Abovementioned assessment allow us to determine how the investigated classifiers respond to changes in vital properties of datasets. In order to assess the relations in a quantitative way, we used Spearman correlation coefficient Spearman1904 (). The significance of the obtained correlations is tested using twotailed ttest Hollander_2013_book (). As in the experiments related to classification quality, the significance level was also set to and we employed Holm method to adjust pvalues demsar2006 ().
Name  Sr  No.  N  d  L  CD  LD  avIR  AVs 

Arts  M  1  7484  1759  26  1.654  .064  94.738  .059 
Azotobacter  W  2  407  33  13  1.469  .113  2.225  .010 
Birds  M  3  645  279  19  1.014  .053  5.407  .033 
Caenorhabditis  W  4  2512  41  21  2.419  .115  2.347  .010 
Drosophila  W  5  2605  42  22  2.656  .121  1.744  .004 
Emotions  M  6  593  78  6  1.868  .311  1.478  .011 
Enron  M  7  1702  1054  53  3.378  .064  73.953  .303 
Flags  M  8  194  50  7  3.392  .485  2.255  .061 
Flare1  M  9  323  28  3  0.232  .077  2.423  .005 
Flare2  M  10  1066  30  3  0.209  .070  14.152  .006 
Genbase  M  11  662  1213  27  1.252  .046  37.315  .029 
Geobacter  W  12  379  31  11  1.264  .115  2.750  .014 
Haloarcula  W  13  304  33  13  1.602  .123  2.419  .016 
Human  X  14  3106  454  14  1.185  .085  15.289  .020 
IMDB0  Me  15  3042  1029  28  1.987  .071  24.611  .109 
IMDB1  Me  16  3044  1029  28  1.987  .071  24.585  .106 
Medical  M  17  978  1494  45  1.245  .028  89.501  .047 
MimlImg  Z  18  2000  140  5  1.236  .247  1.193  .001 
Plant  X  19  978  452  12  1.079  .090  6.690  .006 
Pyrococcus  W  20  425  38  18  2.136  .119  2.421  .015 
Saccharomyces  W  21  3509  47  27  2.275  .084  2.077  .005 
Scene  M  22  2407  300  6  1.074  .179  1.254  .000 
SimpleHC  T  23  3000  40  10  1.900  .190  1.138  .001 
SimpleHS  T  24  3000  40  10  2.307  .231  2.622  .050 
Slashdot  Me  25  3782  1101  22  1.181  .054  17.693  .013 
Tmc2007_0  M  26  2857  522  22  2.222  .101  17.153  .195 
Tmc2007_1  M  27  2834  522  22  2.242  .102  17.123  .191 
Waterquality  M  28  1060  30  14  5.073  .362  1.767  .037 
Yeast  M  29  2417  117  14  4.237  .303  7.197  .104 
5 Results and Discussion
This section shows the results obtained during the conducted experimental study. The following subsections provide a detailed description of outcome related to classification quality and dependencies between results obtained by investigated algorithms and set characteristics respectively.
5.1 Classification quality
The Summarised results related to the classification quality, which is analysed from different points of view using appropriate quality criteria, are presented in table 3 and figure 1. Additionally the full results are presented in table 4.
First of all, it is worth noting that the results reveals that the proposed algorithm does not perform significantly worse than reference methods in terms of any quality criterion. What is more, weighted algorithm outperforms the unweighted FCM approach in terms of macro averaged measure. Although, the weighted FCM does not provide a significant improvement over original label pairwise ensemble, this result indicates that the proposed weighting scheme allows the FCM classifier to achieve better performance for rare labels. This phenomenon can be explained by the fact that the weighting scheme assigns lower weights to the FCM classifiers that are biased towards the majority class, since those classifiers cannot be successfully corrected using FCM approach. As a consequence, the outcome for given label is produced using base classifiers that were built for more balanced binary subproblems. The reported property reduces the tendency of the original FCM algorithm to increase the bias towards the majority class Trajdos2015 () and allows the FCMbased algorithms to be successfully employed in the task of multilabel imbalance classification.
What is more, the classification quality expressed using microaveraged criterion does not differ significantly between FCM and its weighted version. It demonstrates that the increasue of classification quality for rare labels is not followed by deterioration of classification quality for frequent labels. The weighting procedure also causes no significant loss of classification quality under examplebased loss. Moreover, in case of microaveraged and examplebased measures, the approaches based on the idea of fuzzy confusion matrix significantly outperforms base label pairwise algorithm. On the other hand, no significant improvement for frequent labels shows that the proposed methods offers almost no improvement when the LPW ensemble is built using labelbalanced datasets. Since, for those datasets the base binary classifiers are rather competent. However those competent classifiers tends to commit systematic errors. As a consequence, the utilisation of FCM based approach allows to improve classification quality, in comparison with uncorrected label pairiwise ensemble, for frequent labels.
The proposed algorithm significantly improves the unweighted one in terms of zeroone quality criterion. Significant improvement under this criterion shows that the proposed method achieves the greatest number of exact match results among the investigated procedures. Combining this results with the performance achieved under macroaveraged loss, we can conclude that the increase in perfect match ratio is a consequence of improved classification of rare labels. However, the increase in perfect match ratio is not followed by overall improvement in classification.
The experiments show that assessed classifiers do not differ in a significant way when we consider their ability to produce label ranking instead of a simple binary response.
Hamming  Zeroone  Ranking  

1  2  3  1  2  3  1  2  3  
1  0.507  0.507  0.721  0.265  0.815  0.570  
2  0.507  0.082  0.570  
Rnk  2.172  1.828  2.000  2.224  2.121  1.655  2.034  1.759  2.207 
Macro  Micro  Example  
1  0.198  0.932  0.073  0.073  0.012  0.072  
2  0.026  0.609  0.733  
Rnk  1.966  2.310  1.724  2.414  1.793  1.793  2.448  1.724  1.828 
No.  Hamming  Zeroone  Ranking  Macro  Micro  Example  

1  2  3  1  2  3  1  2  3  1  2  3  1  2  3  1  2  3  
1 
.478 
.453  .560  1.00  1.00  1.00  .254  .152 
.148 
.831  .849 
.827 
.829 
.794 
.822  .835 
.799 
.825 
2 
.167 
.392  .419 
.995 
.998  .998  .217 
.158 
.180  .971  .808 
.797 
.877 
.750 
.758  .943 
.807 
.810 
3  .470 
.469 
.484  1.00  1.00 
.998 
.125 
.092 
.111  .844 
.837 
.840  .837 
.832 
.836  .857 
.851 
.852 
4 
.145 
.274  .233 
.941 
1.00  1.00  .207  .199 
.188 
.980  .879 
.851 
.924  .797 
.781 
.910  .882 
.874 
5 
.144 
.388  .339 
.895 
.997  .998  .162 
.155 
.165  .987  .780 
.777 
.929  .738 
.731 
.853  .823 
.822 
6  .304 
.302 
.329 
.900 
.913  .930  .173 
.171 
.175  .362 
.354 
.375  .363 
.357 
.378  .378 
.374 
.392 
7  .457  .370 
.122 
1.00  .982 
.981 
.150 
.319  .266  .858  .875 
.825 
.805  .826 
.785 
.808 
.774 
.824 
8 
.273 
.282  .331  .894 
.882 
.953 
.190 
.205  .239 
.414 
.426  .421 
.272 
.282  .336 
.282 
.290  .351 
9  .557  .474 
.408 
.972  .947 
.932 
.042 
.046  .051  .865  .854 
.844 
.806 
.816  .816 
.879 
.893  .892 
10  .580  .588 
.494 
.961  .982 
.933 
.015 
.013 
.017  .823 
.783 
.789  .823  .816 
.799 
.870  .872 
.860 
11  .410  .536 
.097 
1.00  .968 
.894 
.095 
.208  .190 
.609 
.787  .686  .854  .863 
.738 
.856  .851 
.707 
12 
.431 
.422  .464  1.00  1.00 
.997 
.114 
.112 
.134  .748  .766 
.740 
.707 
.706 
.712  .759  .758 
.756 
13  .352  .259 
.237 
.929  .761  .761 
.173 
.263  .262 
.777 
.941  .945 
.753 
.930  .927 
.716 
.699  .697 
14  .457 
.454 
.462  1.00  1.00  1.00  .247 
.159 
.161 
.829 
.846  .836  .764 
.743 
.747  .765 
.745 
.747 
15  .502  .383 
.271 
1.00  1.00 
.998 
.190 
.297  .292 
.862 
.887  .893  .796  .803 
.785 
.801 
.797 
.801 
16  .504  .400 
.291 
1.00  1.00 
.999 
.190 
.308  .304 
.862 
.888  .885  .797  .808 
.795 
.802 
.805  .818 
17  .485  .174 
.182 
1.00  .998 
.992 
.092 
.080 
.090 
.577 
.690  .625  .901 
.774 
.787  .902 
.751 
.769 
18 
.350 
.354  .357  .962  .962 
.946 
.229 
.206 
.209  .473 
.455 
.462  .477 
.464 
.471  .477 
.464 
.467 
19  .453 
.428 
.476  1.00  1.00 
.999 
.246 
.197 
.211  .796  .819 
.792 
.757 
.734 
.745  .756 
.732 
.738 
20  .332  .304 
.303 
.951 
1.00  1.00 
.195 
.295  .298 
.822 
.915  .914 
.766 
.833  .833 
.766 
.880  .882 
21 
.202 
.215  .203 
.946 
1.00  1.00  .224  .224  .226  .962  .926  .926  .897  .856 
.855 
.899 
.927  .927 
22  .325 
.312 
.316  .994  .968 
.931 
.103 
.084 
.096  .471 
.456 
.470  .490 
.476 
.484  .487  .466 
.463 
23  .370  .344 
.262 
1.00  .999 
.979 
.190  .189  .189  .581  .569 
.524 
.580  .569 
.523 
.583  .570 
.512 
24  .370 
.292 
.310  .990 
.983 
.984 
.325 
.340  .340 
.669 
.804  .796 
.575 
.585  .586 
.589 
.606  .607 
25 
.074 
.702  .687 
.990 
1.00  1.00  .282 
.176 
.213  .861  .792 
.786 
.981  .868 
.866 
.986  .869 
.866 
26  .406  .312 
.189 
1.00  .977 
.927 
.056 
.174  .153 
.731 
.772  .747  .673  .657 
.579 
.680  .624 
.568 
27  .395  .313 
.190 
1.00  .969 
.926 
.055 
.177  .157 
.732 
.765  .747  .665  .657 
.582 
.672  .625 
.572 
28  .384 
.372 
.392  .999 
.998 
1.00  .308  .290 
.288 
.503  .566 
.496 
.438  .435 
.422 
.459  .454 
.444 
29  .299 
.283 
.305  .990  .990  .994  .184 
.174 
.264 
.533 
.589  .624  .378 
.367 
.461  .391 
.381 
.468 
5.2 Impact of dataset properties
In this section we assess relations between classification quality obtained by a classifier employed on given multilabel set and properties of this set. At the beginning of correlation analysis, it is worth mentioning that, in general, the lack of significant correlation between multilabel set characteristics and classification quality obtained by an algorithm, under specific circumstances, can be interpreted as an advantage of the classifier. That is, the algorithm is more elastic, as it offers a possibility to be employed in order to solve multilabel classification problems for data sets which significantly differs in characteristics. However, the classifier can be said to be elastic only when it offers acceptable classification quality for wide range of data sets. Achieving satisfactory quality is an important condition since it is easy to build a classifier which is completely independent of set characteristics and achieves low classification quality.
In general,we can observe that if label density (LD) increases, the classification quality increases. What is more, in most cases, correlations are significant. This strong correlation is a result of employment of label pairwise decomposition of the multilabel task. That is, when LD is high, the instances are better utilised during training and validation phases. In other words, an instance that is relevant to many categories simultaneously more often becomes a member of training or validation set. As a consequence, each of underlying binary classifiers is built using a larger number of training instances. The main exception to this rule is the Ranking loss criterion. This result shows that under the considered classification methods can not produce more relevant label ranking even if base classifiers are more competent.
It can also be seen that, in general, the classification quality decreases when imbalance ratio increases. However, this fact is widely known observation for machine learning Lopez2013 () or under the multilabel classification framework, in particular, Charte2014 (). Exceptions to this trend are results obtained in terms of ranking loss and Hamming loss. However, for those loss function, the change of correlation sign cannot be considered as significant.
On the other hand, no consistent tendency for the average Scumble measure can be observed. What is more, for quality criteria other than zeroone loss the obtained correlations are not significant.
Now, let us investigate each classification quality criterion separately.
First of all, we analyse macroaveraged measure. For the mentioned quality criterion, only the introduced algorithm does not demonstrate significant correlation with label density, although the corresponding pvalue is very close to the assumed significance level. This observation supports formerly made a claim that the proposed weighting approach can eliminate from the ensemble classifiers that offer no possibility of successful correction using FCM approach, including classifiers that are build using a too low number of training instances. As a consequence, the relation between classification quality for rare labels and label density can be interpreted as insignificant.
On the other hand, correlations between LD and classification quality measured in terms of macroaveraged and examplebased measures are insignificant. This result shows that although the proposed approach can reduce the impact of label density for rare labels, the classification quality for frequent labels is still affected by LD. This result clearly shows that the classification quality gains when the number of instances grows. However, for rare labels, the proposed method prevents it from dropping too low.
In contrast to the results related to the macroaveraged measure, for the Hamming loss correlation between the classification quality and label density is far from being significant. Whereas the correlation obtained for the remaining methods are significant. A possible explanation to this result is the impact of classification quality of rare labels, which is described above.
Although the considered algorithms are rather insensitive to changes in scumble value, under the zeroone loss, the original label pairwise ensemble shows significant correlation with scumble coefficient. The classification quality of the above mentioned method decreases when scumble increases. What is more, the algorithm achieves the highest rank in terms of this measure. This results shows that FCMbased correction eliminate the quality loss when frequent labels cooccur with the frequent ones.
Hamming  Hamming pval  Zeroone  Zeroone pval  

1  2  3  1  2  3  1  2  3  1  2  3  
LD  0.561  0.515  0.007  0.012  0.030  1.000  0.583  0.385  0.150  0.007  0.236  1.000 
avIR  0.597  0.330  0.128  0.006  0.414  1.000  0.670  0.304  0.078  0.001  0.547  1.000 
AVs  0.342  0.104  0.339  0.414  1.000  0.414  0.475  0.071  0.016  0.065  1.000  1.000 
Ranking  Ranking pval  Macro  Macro pval  
LD  0.278  0.213  0.309  0.863  1.000  0.717  0.505  0.486  0.446  0.047  0.060  0.108 
avIR  0.195  0.018  0.028  1.000  1.000  1.000  0.220  0.351  0.295  1.000  0.373  0.601 
AVs  0.074  0.357  0.326  1.000  0.515  0.674  0.078  0.130  0.110  1.000  1.000  1.000 
Micro  Micro pval  Example  Example pval  
LD  0.751  0.734  0.706  0.000  0.000  0.000  0.770  0.670  0.656  0.000  0.001  0.001 
avIR  0.366  0.433  0.408  0.203  0.114  0.139  0.375  0.272  0.315  0.269  0.611  0.481 
AVs  0.145  0.007  0.018  1.000  1.000  1.000  0.144  0.205  0.131  0.910  0.859  0.910 
6 Conclusion
During this study, we successfully tackled the issue of eliminating drawbacks of the previously proposed correction algorithm based on fuzzy confusion matrix. To reach this goal, we proposed an information theoretic competence measure, that assess if the base binary classifier can take benefits from correction based on the FCM model.
During the experimental study, we obtained interesting results. That is, the proposed approach is able to improve classification quality for rare labels (macroaveraged loss) and under zeroone loss. What is more, the proposed weighting scheme does not achieve significantly lower quality in terms of any criterion. In addition, the approach reduces the impact of changing setspecific characteristics. As a consequence, the improved version of the FCMbased algorithm is recommended for use instead of the original one.
Since the obtained results are promising, we are willing to continue the development of FCMbased algorithms.
Acknowledgements
The work was supported by the statutory funds of the Department of Systems and Computer Networks, Wroclaw University of Science and Technology. Computational resources were provided by PLGrid Infrastructure.
References
 (1) E. Gibaja, S. Ventura, Multilabel learning: a review of the state of the art and ongoing research, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (6) (2014) 411–444. doi:10.1002/widm.1139.
 (2) J.Y. Jiang, S.C. Tsai, S.J. Lee, Fsknn: Multilabel text categorization based on fuzzy similarity and k nearest neighbors, Expert Systems with Applications 39 (3) (2012) 2813–2821. doi:10.1016/j.eswa.2011.08.141.
 (3) J. Read, A. Bifet, G. Holmes, B. Pfahringer, Scalable and efficient multilabel classification for evolving data streams, Machine Learning 88 (12) (2012) 243–272. doi:10.1007/s1099401252796.
 (4) J. Díez, O. Luaces, J. J. del Coz, A. Bahamonde, Optimizing different loss functions in multilabel classifications, Progress in Artificial Intelligence 3 (2) (2014) 107–118. doi:10.1007/s1374801400607.
 (5) Y. Wei, W. Xia, M. Lin, J. Huang, B. Ni, J. Dong, Y. Zhao, S. Yan, Hcp: A flexible cnn framework for multilabel image classification, IEEE Transactions on Pattern Analysis and Machine Intelligence 38 (9) (2016) 1901–1907. doi:10.1109/tpami.2015.2491929.
 (6) E. Hüllermeier, J. Fürnkranz, On predictive accuracy and risk minimization in pairwise label ranking, Journal of Computer and System Sciences 76 (1) (2010) 49–62. doi:10.1016/j.jcss.2009.05.005.
 (7) M. Kurzynski, M. Krysmann, P. Trajdos, A. Wolczowski, Multiclassifier system with hybrid learning applied to the control of bioprosthetic hand, Computers in Biology and Medicine 69 (2016) 286–297. doi:10.1016/j.compbiomed.2015.04.023.
 (8) P. Trajdos, M. Kurzynski, A dynamic model of classifier competence based on the local fuzzy confusion matrix and the random reference classifier, International Journal of Applied Mathematics and Computer Science 26 (1). doi:10.1515/amcs20160012.
 (9) P. Trajdos, M. Kurzynski, An extension of multilabel binary relevance models based on randomized reference classifier and local fuzzy confusion matrix, in: Intelligent Data Engineering and Automated Learning – IDEAL 2015, Springer International Publishing, 2015, pp. 69–76. doi:10.1007/9783319248349_9.
 (10) J. O. Berger, Statistical Decision Theory and Bayesian Analysis, Springer New York, 1985. doi:10.1007/9781475742862.
 (11) T. Woloszynski, M. Kurzynski, A probabilistic model of classifier competence for dynamic ensemble selection, Pattern Recognition 44 (1011) (2011) 2656–2668. doi:10.1016/j.patcog.2011.03.020.
 (12) L. Zadeh, Fuzzy sets, Information and Control 8 (3) (1965) 338–353. doi:10.1016/s00199958(65)90241x.
 (13) M. Dhar, On cardinality of fuzzy sets, International Journal of Intelligent Systems and Applications 5 (6) (2013) 47–52. doi:10.5815/ijisa.2013.06.06.
 (14) N. D. Cahill, Normalized measures of mutual information with general definitions of entropy for multimodal image registration, in: Biomedical Image Registration, Springer Berlin Heidelberg, 2010, pp. 258–268. doi:10.1007/9783642143663_23.
 (15) D. J. Hand, K. Yu, Idiot’s bayes: Not so stupid after all?, International Statistical Review / Revue Internationale de Statistique 69 (3) (2001) 385. doi:10.2307/1403452.
 (16) The random subspace method for constructing decision forests, IEEE Transactions on Pattern Analysis and Machine Intelligence 20 (8) (1998) 832–844. doi:10.1109/34.709601.
 (17) M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, I. H. Witten, The weka data mining software, ACM SIGKDD Explorations Newsletter 11 (1) (2009) 10. doi:10.1145/1656274.1656278.
 (18) E. SpyromitrosXioufis, G. Tsoumakas, W. Groves, I. Vlahavas, Multitarget regression via input space expansion: treating targets as inputs, Machine Learning 104 (1) (2016) 55–98. doi:10.1007/s109940165546z.
 (19) Z.H. Zhou, M.L. Zhang, S.J. Huang, Y.F. Li, Multiinstance multilabel learning, Artificial Intelligence 176 (1) (2012) 2291–2320. doi:10.1016/j.artint.2011.10.002.
 (20) J. T. Tomás, N. Spolaôr, E. A. Cherman, M. C. Monard, A framework to generate synthetic multilabel datasets, Electronic Notes in Theoretical Computer Science 302 (2014) 155–176. doi:10.1016/j.entcs.2014.01.025.
 (21) J.S. Wu, S.J. Huang, Z.H. Zhou, Genomewide protein function prediction through multiinstance multilabel learning, IEEE/ACM Transactions on Computational Biology and Bioinformatics 11 (5) (2014) 891–902. doi:10.1109/tcbb.2014.2323058.
 (22) J. Xu, Fast multilabel core vector machine, Pattern Recognition 46 (3) (2013) 885–898. doi:10.1016/j.patcog.2012.09.003.

(23)
J. Read, R. Peter,
Meka:http://meka.sourceforge.net/
(2017).
URL http://meka.sourceforge.net/  (24) O. Luaces, J. Díez, J. Barranquero, J. J. del Coz, A. Bahamonde, Binary relevance efficacy for multilabel classification, Progress in Artificial Intelligence 1 (4) (2012) 303–313. doi:10.1007/s137480120030x.
 (25) J. Demšar, Statistical comparisons of classifiers over multiple data sets, The Journal of Machine Learning Research 7 (2006) 1–30.
 (26) C. Spearman, The proof and measurement of association between two things, The American Journal of Psychology 15 (1) (1904) 72. doi:10.2307/1412159.
 (27) M. Hollander, D. A. Wolfe, E. Chicken, Nonparametric Statistical Methods, John Wiley & Sons, Inc., 2015. doi:10.1002/9781119196037.
 (28) F. Charte, A. Rivera, M. J. del Jesus, F. Herrera, Concurrence among imbalanced labels and its influence on multilabel resampling algorithms, in: Lecture Notes in Computer Science, Springer International Publishing, 2014, pp. 110–121. doi:10.1007/9783319076171_10.
 (29) V. López, A. Fernández, S. García, V. Palade, F. Herrera, An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics, Information Sciences 250 (2013) 113–141. doi:10.1016/j.ins.2013.07.007.