A novel multivariate performance optimization method based on sparse coding and hyperpredictor learning
Abstract
In this paper, we investigate the problem of optimization multivariate performance measures, and propose a novel algorithm for it. Different from traditional machine learning methods which optimize simple loss functions to learn prediction function, the problem studied in this paper is how to learn effective hyperpredictor for a tuple of data points, so that a complex loss function corresponding to a multivariate performance measure can be minimized. We propose to present the tuple of data points to a tuple of sparse codes via a dictionary, and then apply a linear function to compare a sparse code against a give candidate class label. To learn the dictionary, sparse codes, and parameter of the linear function, we propose a joint optimization problem. In this problem, the both the reconstruction error and sparsity of sparse code, and the upper bound of the complex loss function are minimized. Moreover, the upper bound of the loss function is approximated by the sparse codes and the linear function parameter. To optimize this problem, we develop an iterative algorithm based on descent gradient methods to learn the sparse codes and hyperpredictor parameter alternately. Experiment results on some benchmark data sets show the advantage of the proposed methods over other stateoftheart algorithms.
keywords:
Pattern classification, Loss function, Multivariate performance measures, Sparse coding, Joint learning, Alternate optimization1 Introduction
In traditional machine learning methods, we usually use a loss function to compare the true class label of a data point against its predicted class label. By optimizing the loss functions over all the training set, we seek a optimal prediction function, named a classifier Micheloni201281 ; Roy2013113 ; Kang201439 ; Bhuyan2011 ; Wang2012Multiple . For example, in support vector machine (SVM), a hinge loss function is minimized, and in linear regression (LR), a logistic loss function is used Pragidis201522 ; Couellan20154284 ; Siray2015217 ; Patil2015349 . However, when we evaluate the performance of a class label predictor, we usually consider a tuple of data points, and use a complex multivariate performance measure over the considered tuple of data points, which is different from the loss functions used in the training procedure significantly Joachims2005377 ; Walker201155 ; Zhang2011814 ; Zhang20123623 ; Mao20132051 . For example, we may use area under receiver operating characteristic curve (AUC) as a multivariate performance measure to evaluate the classification performance of SVM. Because SVM class label predictor is trained by minimizing the loss functions over training data points, it cannot be guaranteed to minimize the loss function corresponding to AUC. Many other multivariate performance measures are also defined to compare a true class label tuple of a data point tuple against its predicted class label tuple, and they can also be used for different machine learning applications. Some examples of the multivariate performance measures are as Fscore Zemmoudj2014371 ; Gao2014 , precisionrecall curve eleven point (PRBEP) Boyd2013451 ; Lopes2014322 , and Matthews correlation coefficient (MCC) Kumari2015175 ; Shepperd2015106 . To seek the optimal multivariate performance measures on a given tuple of data points, recently, the problem of multivariate performance measure optimization is proposed. This problem is defined as a problem of learning a hyperpredictor for a tuple of data points to predict a tuple of class labels. The hyperpredictor is learned so that a multivariate performance measure used to compare the true class label tuple and the predicted class label tuple can be optimized directly.
1.1 Related works
Some methods have been proposed to solve the problem of multivariate performance measures. For example,

Joachims Joachims2005377 proposed a SVM method to optimize multivariate nonlinear performance measures, including Fscore, AUC etc. This method takes a multivariate predictor, and gives an algorithm to train the a multivariate SVM in polynomial time for large classes so that the potentially nonlinear performance measures can be optimized. Moreover, the translational SVM with hinge loss function can be treated as a special case of this method.

Zhang et al. Zhang2011814 proposed a smoothing strategy for multivariate performance score optimization., in particular PREBP and AUC. The proposed method combines Nesterov’s accelerated gradient algorithm and the smoothing strategy, and obtains an optimization algorithm. This algorithm converges to a given accurate solution in a limited number of iterations corresponding to the accurate.

Mao and Tsang Mao20132051 proposed a generalized sparse regularizer for multivariate performance measure optimization. Based on the this regularizer, a unified feature selection and general loss function optimization is developed. The formulation of the problem is solved by a twolayer cutting plane algorithm, and the convergence is presented. Moreover, it can also be used to optimize the multivariate measures of multipleinstance learning problems.

Li et al. Li20131370 proposed to learn a nonlinear classifier for optimization of nonlinear and nonsmooth performance measures by novel twostep approach. Firstly, a nonlinear auxiliary classifiers with existing learning methods is trained, and then it is adapted for specific performance measures. The classifier adaptation can be reduced to a quadratic program problem, similar to the method introduced in Joachims2005377 .
1.2 Contributions
In this paper, we try to investigate the usage of sparse coding in the problem of multivariate performance optimization. Our work is inspired by the work of multivariate performance optimization using multiple kernel learning proposed by Wang, et al. wang2015multiple . The work in wang2015multiple is a original contribution of major significance, because for the first time, it proposed to map the data into another space to learn a more effective predictor in the new space for multivariate performance measure optimization. Specifically, it uses multiple kernel learning wang2014effective to map the input data to a new space, and then learns a new predictor to optimize the desired multivariate performance measure. Our work also follows this strategy, but our work uses sparse coding to map the original input data to a new sparse code space, instead of using multiple kernel learning. Moreover, our method also learns a new predictor in the new space to optimize the multivariate performance measure. Sparse coding is an important and popular data representation method, and it represent a given data point by reconstructing it with regard to a dictionary Li20151254 ; AlShedivat20141665 ; Wang20141630 ; Wang20133249 ; wang2015representing . The reconstruction coefficients are imposed to be sparse, and used as a new representation of the data point. Sparse coding has been used widely in both machine learning and computer vision communities for pattern classification problems. For example, Mairal et al. Mairal20091033 proposed to learn the sparse codes and a classifier jointly on a training set. However, the loss function used in this method is a traditional logistic loss. In this paper, we ask the following question: How can we learn the sparse codes and its corresponding class prediction function to optimize a multivariate performance measure? To answer this question, we propose a novel multivariate performance optimization method. In this method, we try to learn sparse codes from the tuple of training data points, and apply a linear function to match the sparse code tuple against a candidate class label. Based on the linear function, we design a hyperpredictor to predict the optimal class label tuple. Moreover, to the loss function of the desired multivariate performance measure is used to compare the prediction of the hyperpredictor and the true class label tuple, and minimized to optimize the multivariate performance measure. The contributions of this paper are of two folds:

We proposed a joint model of sparse coding and multivariate performance measure optimization. We learn both the sparse codes and the hyperpredictor to optimize the desired multivariate performance measure. The input of the hyperprediction function is the tuple of the sparse codes, and the output is a class label tuple, which is further compared the to the true class label tuple by a multivariate performance measure. A joint optimization problem is constructed for this problem. In the objective function of the optimization problem, both the reconstruction error and the sparsity of the sparse code are considered. Simultaneously, the multivariate loss function of the multivariate performance function is also included in the objective. The multivariate loss function may be very complex, and even does not have a close form, thus it is difficult to optimize it directly. We seek its upper bound, and approximate is as a linear function of the hyperpredictor function.

We proposed a novel iterative algorithm to optimize the proposed problem. We adapt the alternate optimization strategy, and optimize the sparse code, dictionary and the hyperpredictor function alternately in an iterative algorithm. Both sparse codes and hyperpredictor parameters are learned by gradient descent methods, and the dictionary is learned by Lagrange multiplier method.
1.3 Paper organization
This paper is organized as follows. In section 2, we introduce the proposed multivariate performance measure optimization method. In section 3, the proposed method is evaluated experimentally and compared to stateoftheart multivariate performance measure optimization methods. In section 4, the paper is concluded with future works.
2 Proposed method
In this section, we introduce the proposed method. We first model the problem with an optimization problem, then solve it with an iterative optimization strategy, and finally develop an iterative algorithm based on the optimization results.
2.1 Problem formulation
Suppose we have a tuple of training data points, , and its corresponding class label tuple is denoted as , where is the dimensional feature vector of the th training data point, and is the binary label of the th training data point. We can use a machine learning method to predict the class label tuple, , where is the predicted class label of the th data point. A multivariate performance measure, , is defined to compare a predicted class label tuple of a data point tuple against its true class label tuple . To learn a hyperpredictor to map a data point tuple to a optimal class label tuple , we should learn it to minimize a desired predefined multivariate performance measure, . The proposed learning framework is shown in the flowchart in Fig. 1.
We propose to present the data points to their sparse codes by sparse coding method, and then use a linear hyperpredictor to predict the class label tuple. We consider the follow problems in the learning procedure,

Sparse coding of data tuple: To represent the data points in the data tuple, we propose to reconstruct each data point in the data tuple by using a dictionary,
(1) where is the th dictionary element of the dictionary, and is the dictionary matrix with its th column as the th dictionary element, and is the number of the dictionary elements. is the coefficient of the th dictionary element for the reconstruction of the th data point, and is the coefficient vector for the reconstruction of the th data point. We assume that for each data point, only a few dictionary elements are used, thus its coefficient should be sparse, and we also call it sparse code of the data point. To learn the dictionary and the sparse codes of the data tuple, we propose to minimize the reconstruction error and encourage the sparsity of the sparse codes, and the following optimization problem is obtained over the data tuple,
(2) In the objective function, the first part of each term is the reconstruction error measured by squared norm, and the second part is the sparsity measured by the norm of . is a tradeoff parameter to control the sparsity of . If we have a larger value of , the learned will be more sparse. The optimal value of this parameter can be selected by linear search or cross validation.

Learning of hyperpredictor: We apply a linear function, , to compare the tuple of sparse codes of the data tuple, , against a candidate class tuple, ,
(3) where is the parameter vector of the function. Then we the candidate class label tuple which archives the largest response of will be output as the optimal class label tuple,
(4) where is the hyperspace of the candidate class label tuple. To learn the linear function parameter vector w for the hyperpredictor and the sparse codes, we propose to learn it by minimizing a loss function of a predefined multivariate performance measure, . To reduce the complexity of the linear function, we also propose to minimize the squared norm of the linear function parameter w. Thus we propose the following optimization problem to learn w,
(5) where and are other tradeoff parameters. is the weight of the model complexity penalty term, and a larger can leads to a simpler model. is the weight of the loss functions over the training data points, and a larger value of can lead the model to fit the training set better. The values of and can be selected by linear search of cross validation. Direct minimization of is difficult, thus we seek its upper bound and minimize its upper bound to optimize .
 Theorem 1

The upper bound of can be obtained as follows,
(6) where
(7) and
(8)  Proof

According to (4), since achieves a maximum , we have
(9) Substituting (3) to the left hand of (9), and according to the definition of function in (7), we have
(10) Thus (9) can be rewritten as
(11) To find the upper bound of , we scan all the candidate class label tuples , and seek the one or more candidates which can achieve the maximum , and we can see the maximum is a upper bound of ,
(12) Moreover, we also define a indicator for each to indicate if achieves the maximum , as in (8). In this way, we can rewrite the left hand of (12) as follows,
(13) Thus we have
(14)
(15)
(16)  
In this problem, we learn the dictionary, sparse codes, and the hyperpredictor parameter jointly.
2.2 Problem optimization
To optimize the problem in (16), we use the alternate optimization strategy. In an iterative algorithm, the variables are updated in turn. When the sparse codes are optimized, the linear function parameter and the dictionary are fixed. When the linear function parameter is optimized, the sparse codes and the dictionary are fixed. When the dictionary is optimized, the sparse codes and the linear function parameter is fixed. This strategy is shown in a flowchart in Fig. 2.
2.2.1 Optimization of sparse codes
When we try to optimize the sparse codes, we fix the dictionary and the linear function parameter, and optimize the sparse codes one by one, i.e., when one sparse code is considered, other sparse codes are fixed. Thus we turn the problem in (16) to the following optimization problem by only considering , and removing terms irrelevant to ,
(17)  
We rewrite the sparsity term in (17), , as follows,
(18) 
where is a diagonal matrix with its th diagonal element as . To make the objective function a smooth function, we fix the sparse code elements in the diagonal matrix as the elements of the previous iteration. Moreover, we note that is also a function of as shown in (8). We also first calculate by using sparse codes solved in the previous iteration, and then fix it when we consider in the current iteration. In this way, (17) is changed to
(19)  
where is the th element of solved in previous iteration, and is calculated using previous solved and w. To seek its minimization, we update by descending it to its gradient of the object ,
(20) 
where is the sparse code updated in current iteration, is the sparse code solved in previous iteration, is the descent step, and is the gradient of , which is defined as
(21)  
2.2.2 Optimization of linear function parameter
By only considering w in (16), fixing sparse codes, dictionary, and as results of previous iteration, and removing the terms irrelevant to w, we turn (16) to
(22) 
To minimize this objective function, we update w by descending it to the gradient of ,
(23) 
where is the gradient of , which is defined as
(24) 
2.2.3 Optimization of dictionary
To optimize the dictionary matrix , we remove the terms irrelevant to from the objective, fix the other variables, and obtain the following optimization problem,
(25)  
The dual optimization problem for this problem is
(26)  
where is the Lagrange multiplier for the constrain , is a diagonal matrix with ist diagonal elements as , and is the Lagrange function. To minimize the Lagrange function with regard to , we set its gradient with regard to to zero, and we have
(27)  
To solve the Lagrange multiplier variables, we use the gradient ascent algorithm to obtain in each iteration. After we obtain , we can obtain according to (27).
2.3 Iterative algorithm
Based on the optimization results, we develop a novel iterative algorithm, named JSCHP. The algorithm is described in Algorithm 1. As we can see from the algorithm, the iterations are repeated times, and in each iteration, the variables are updated sequentially. The flowchart of the proposed iterative algorithm is given in Fig. 3.
The novelty of this algorithm is of three folds:

This algorithm is the first algorithm to learn the sparse codes, dictionary and hyperpredictor jointly.

This algorithm is the first algorithm to use gradient descent principle to update the hyperpredictor parameters. Traditional hyperpredictor parameter learning method for multivariate performance optimization is based on solving a quadratic programming problem in each iteration, which is timeconsuming. Our algorithm gives up the quadratic programming problem, and instead, we used a simple gradient descent rule to update the parameters efficiently.

This algorithm is also the first algorithm to solve the sparse codes using the gradient descent rule. Traditional sparse coding algorithm solve the sparse codes by optimizing a norm regularized problem directly, which is not convex and timeconsuming. We convert the norm regularization to a norm regularization, which can be easily solved by gradient descent because it is convex.
Please note that the input of the iterative algorithm requires the parameters , and . is the weight of the sparsity term of the sparse code, is the weight of the model complexity term, and is the weight of the losses over the training set.
3 Experiments
In this experiment, we evaluate the proposed algorithm and compare it against stateoftheart multivariate performance optimization methods.
3.1 Data sets
In the experiment, we used the following three data sets.

VANET misbehavior data set: The first data set is for the problem of detecting misbehaving network nodes of Vehicular Ad Hoc Networks (VANETs) Grover2011644 . To construct this data set, we used NCTUns5.0 simulator to conduct simulations, and collected data of 1395 nodes. These nodes belong to two different classes, which are honest nodes and misbehaving nodes. The number of honest nodes is 837, and the number of the misbehaving nodes is 558. Given a candidate nodes, the problem of misbehavior detection is to predict if is a honest node, or a misbehaving node. Thus this is a binary classification problem. To extract the features from each node, we calculate multifarious features, including speeddeviation of node, received signal strength (RSS), number of packets delivered, dropped packets etc.

Profile injection attacks data set: The second data set is for the problem of detecting profile injection attacks in collaborative recommender systems Zhang201496 . It is well known that collaborative recommender systems is vulnerable to profile injection attacks. Injection attacks is defined as malicious users inserting fake profiles into the rating database, and biasing the systems’ output. To construct the data set, we randomly select 1000 genuine user profiles from Movielens 1M dataset as positive data points, and randomly generate 300 attacking fake user profiles as negative data points. The problem of profile injection attacks detection is to classify a candidate user profile to genuine user or fake user. To extract features from each user profile, we first calculate its rating series based on the novelty and popularity of items, and then use the empirical mode decomposition (EMD) to decompose its rating series, and finally extract Hilbert spectrum based features.

UTkinect 3D action data set: The third data set if for the problem of recognizing human actions from 3D body data. In this data set, there are 200 3D body data samples, and each 3D body data samples is treated as a data point. These data points belong to 10 different action classes. The number of data points for each class is 20. The 10 classes are listed as follows: walk, sit down, stand up, pickup, carry, throw, push, pull, wave and clap hands xia2012view . To extract features from each data point, we calculate the histogram of the 3D joints of each data point.
3.2 Experiment setup
To perform the experiments, we used the 10fold cross validation. A data set is split to 10 folds randomly. Each fold was used as a test set in turn. The remaining 9 folds were combined and used as a training set. Given a desired multivariate performance measure, we performed the proposed algorithm on the training set to learn the dictionary and the classifier parameter. Then we used the learned dictionary and the classifier to classify the test data points. Finally, we compared the classification results of the test data points against the true class labels using the given multivariate performance measure.
The following multivariate performance measures were used.

F1 score: The first multivariate performance measure is the F1 score, and it is defined as
(31) 
PRBEP: The third multivariate performance measure is PRBEP, precisionrecall curve eleven point. It is defined as a point where precision and recall values are equal to each other. The precisionrecall curve is obtained by plotting precisions against recalls. Precision and recall are defined as,
(32) We can generate different groups of precisions and recalls, and plot precisions against corresponding recalls to obtain the precisionrecall curve. The point in the curve where precision is equal to the recall is defined as PRBEP.

AUC: The second multivariate performance measure is the AUC, area under operating characteristic curve. Operating characteristic curve is defined as a curve obtained by plotting true positive rate against false positive rate. True positive rate and false positive rate are defined as follows,
(33) By changing a threshold parameter of the classifier, we can have different groups of true positive rates and false positive rates. Plotting true positive rates against its corresponding false positive rates, the operating characteristic curve can be obtained.
3.3 Experiment results
3.3.1 Comparison to stateofthearts
In this experiment, we first compared the proposed algorithm JSCHP to some stateoftheart machine learning algorithms for multivariate performance optimization, including the cuttingplane subspace pursuit (CPSP) Joachims2005377 , multivariate performance measure smoothing (MPMS) Zhang20123623 , feature selection based multivariate performance measure optimization (FSMPM) Mao20132051 , and classifier adaptation based multivariate performance measure optimization (CAMPM) Li20131370 . The boxplots of different performances measures of the 10fold cross validation over different data sets are given in Fig. 4, 5 and 6. From these figures, we can see that the proposed algorithm JSCHP outperforms the compared algorithms in most cases. For example, in the experiments over VANET misbehavior data set, when PRBEP performance is considered, only JSCHP algorithm achieves a median value higher than 0.6, while the media values of all other algorithms are lower than 0.6. Moreover, in the experiments over UTkinect 3D action data set, we can see that the median value of the scores of JSCHP is even higher than the 75th percentile values of other algorithms. These are strong evidences that the proposed algorithm is more effective than the compared algorithms for the problem of optimizing multivariate performance measures. It is also interesting to see that AUC seems a easier multivariate performance measure to optimized than score and PRBEP. In all the experiments over three data sets, the observed AUC values are higher then corresponding scores and PRBEP values. The results of CAMPM, FSMPM and MPMS are comparable to each other, and better than CPSP.
3.3.2 Parameter sensitivity
We are also interested in the sensitivity of the proposed algorithm against three tradeoff parameters , and . Thus we varied the tradeoff parameters , and contemporaneously to compute the sensitivity of the algorithm to the parameters. The average score of the proposed algorithm of combinations of different values of these parameters are given in Fig. 7. From Fig. 7(a) and Fig. 7(b), we can see that when is increasing, the performances are also being improved. is the weight of the sparsity term of the sparse code, and from the experiment results, we can conclude that when we have a larger sparsity penalty, the performance can be better. This means that a sparse representation is important for learning hyperpredictor to optimize multivariate performance measures. It is well known that sparse representation can benefit the learning of a good classifier using common and simple performance measures. However, it is still unknown if such sparse representation can also benefit the learning of hyperpredictor for complex multivariate performance measure optimization. Our experiments answer this question, and we find that the sparsity of the presentation is also important for the optimization of complex multivariate performance measures, just like it works for the simple performance measure optimization. From Fig. 7(a) and Fig. 7(c), we can see that the improvement of the performances against the parameter is not clear. However, the performance is stable for different parameters. This parameter is the weight for the complexity of the hyperpredictor parameter. From the results, we cannot conclude that a simpler predictor can optimize the multivariate performance measure better than a complex predictor. From Fig. 7(b) and Fig. 7(c), we can see that a larger can also improve the performance. This is because is the weight of the upper bound of the corresponding loss function. A larger can lead to a better solution for the minimization of the loss function, and thus leads to a better performance measure.
3.3.3 Running time
We are also interested in the running time of the proposed algorithm and the compared algorithms. The boxplots of running time of different algorithms of the 10fold cross validation over UTkinect 3D action data set is given in Fig. 8. It is obvious that the proposed algorithm has shorter running time than the other algorithms. A possible reason is that the other algorithms are based on cuttingplane algorithm. In this algorithm, in each iteration, a active set is maintained, and a quadratic programming algorithm is solved over this active set. The solving of the quadratic algorithm is time consuming. Moreover, to update the active set, we need to seek a maximization over all possible class label tuples. However, in our algorithm, we only seek a maximization in the class label tuple space to approximate the upper bound, and no quadratic programming problem is considered, while only a gradient descent updating procedure is conducted.
4 Conclusion and future works
In this paper, we proposed a novel method for the problem of multivariate performance measure optimization. This method is based on joint learning of sparse codes of data point tuple and a hyperpredictor to predict the class label tuple. In this way, the sparse code learning is guided by the minimization of the multivariate loss function corresponding to the desired multivariate performance measure. Moreover, we also proposed a novel upper bound approximation of the multivariate loss function. We model the learning problem as an minimization problem and solve it by developing a iterative algorithm based on gradient descent method. The proposed algorithm is compared to stateoftheart multivariate performance measure optimization algorithms, and the results show its advantage. In the future, we will consider extend the proposed framework to structured label prediction problem, since it is similar to multivariate performance measure optimization. In the future, we will also use the proposed algorithm for the application of computer vision wang2015supervised ; wang2015image .
References
 (1) C. Micheloni, A. Rani, S. Kumar, G. Foresti, A balanced neural tree for pattern classification, Neural Networks 27 (2012) 81–90.
 (2) A. Roy, P. Mackin, S. Mukhopadhyay, Methods for pattern selection, classspecific feature selection and classification for automated learning, Neural Networks 41 (2013) 113–129.
 (3) H. Kang, S. Choi, Bayesian common spatial patterns for multisubject eeg classification, Neural Networks 57 (2014) 39–50.
 (4) M. Bhuyan, X. Gao, A proteindependent sidechain rotamer library., BMC bioinformatics 12 Suppl 14 (2011) S10.
 (5) J. J.Y. Wang, H. Bensmail, X. Gao, Multiple graph regularized protein domain ranking, BMC Bioinformatics 13 (1) (2012) 307.
 (6) I. Pragidis, P. Gogas, V. Plakandaras, T. Papadimitriou, Fiscal shocks and asymmetric effects: A comparative analysis, Journal of Economic Asymmetries 12 (1) (2015) 22–33.
 (7) N. Couellan, S. Jan, T. Jorquera, J.P. Georgé, Selfadaptive support vector machine: A multiagent optimization perspective, Expert Systems with Applications 42 (9) (2015) 4284–4298.
 (8) G. Ü. Şiray, S. Toker, S. Kaçiranlar, On the restricted liu estimator in the logistic regression model, Communications in Statistics: Simulation and Computation 44 (1) (2015) 217–232.
 (9) P. Patil, Y. Fatangare, P. Kulkarni, Semisupervised learning algorithm for online electricity data streams, Advances in Intelligent Systems and Computing 324 (2015) 349–358.
 (10) T. Joachims, A support vector method for multivariate performance measures, in: ICML 2005  Proceedings of the 22nd International Conference on Machine Learning, 2005, pp. 377–384.
 (11) D. Walker, G. Wiseman, B. Belcher, S. Dewi, J. Campbell, R. Baydack, Multivariate performance measures for evaluating speckle suppression filters for multitemporal multiincident sar imagery, Canadian Journal of Remote Sensing 37 (1) (2011) 55–68.
 (12) X. Zhang, A. Saha, S. Vishwanathan, Smoothing multivariate performance measures, in: Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence, UAI 2011, 2011, pp. 814–821.
 (13) X. Zhang, A. Saha, S. Vishwanathan, Smoothing multivariate performance measures, Journal of Machine Learning Research 13 (2012) 3623–3680.
 (14) Q. Mao, I.H. Tsang, A feature selection method for multivariate performance measures, IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (9) (2013) 2051–2063.
 (15) S. Zemmoudj, A. Kemmouche, Y. Chibani, Feature selection and classification for urban data using improved fscore with support vector machine, in: 6th International Conference on Soft Computing and Pattern Recognition, SoCPaR 2014, 2014, pp. 371–375.
 (16) J. Gao, H. Tian, Y. Yang, X. Yu, C. Li, N. Rao, A novel algorithm to enhance p300 in single trials: Application to lie detection using fscore and svm, PLoS ONE 9 (11).
 (17) K. Boyd, K. Eng, C. Page, Area under the precisionrecall curve: Point estimates and confidence intervals, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 8190 LNAI (PART 3) (2013) 451–466.
 (18) M. Lopes, G. Bontempi, On the null distribution of the precision and recall curve, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 8725 LNAI (PART 2) (2014) 322–337.
 (19) P. Kumari, A. Nath, R. Chaube, Identification of human drug targets using machinelearning algorithms, Computers in Biology and Medicine 56 (2015) 175–181.
 (20) M. Shepperd, How do i know whether to trust a research result?, IEEE Software 32 (1) (2015) 106–109.
 (21) N. Li, I. Tsang, Z.H. Zhou, Efficient optimization of performance measures by classifier adaptation, IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (6) (2013) 1370–1382.
 (22) J. Wang, H. Wang, Y. Zhou, N. McDonald, Multiple kernel multivariate performance learning using cutting plane algorithm, in: Systems, Man and Cybernetics (SMC), 2015 IEEE International Conference on, IEEE, 2015.
 (23) H. Wang, J. Wang, An effective image representation method using kernel classification, in: Tools with Artificial Intelligence (ICTAI), 2014 IEEE 26th International Conference on, IEEE, 2014, pp. 853–858.
 (24) Y. Li, Robust content fingerprinting algorithm based on sparse coding, IEEE Signal Processing Letters 22 (9) (2015) 1254–1258.
 (25) M. AlShedivat, J. J.Y. Wang, M. Alzahrani, J. Huang, X. Gao, Supervised transfer sparse coding, in: Proceedings of the National Conference on Artificial Intelligence, Vol. 3, 2014, pp. 1665–1672.
 (26) J. J.Y. Wang, X. Gao, Semisupervised sparse coding, in: Proceedings of the International Joint Conference on Neural Networks, 2014, pp. 1630–1637.
 (27) J. J.Y. Wang, H. Bensmail, X. Gao, Joint learning and weighting of visual vocabulary for bagoffeature based tissue classification, Pattern Recognition 46 (12) (2013) 3249–3255.
 (28) J. Wang, Y. Zhou, M. Yin, S. Chen, B. Edwards, Representing data by sparse combination of contextual data points for classification, in: Advances in Neural Networks–ISNN 2015, Springer, 2015.
 (29) J. Mairal, F. Bach, J. Ponce, G. Sapiro, A. Zisserman, Supervised dictionary learning, in: Advances in Neural Information Processing Systems 21  Proceedings of the 2008 Conference, 2009, pp. 1033–1040.
 (30) J. Grover, N. Prajapati, V. Laxmi, M. Gaur, Machine learning approach for multiple misbehavior detection in vanet, Communications in Computer and Information Science 192 CCIS (PART 3) (2011) 644–653.
 (31) F. Zhang, Q. Zhou, Hhtsvm: An online method for detecting profile injection attacks in collaborative recommender systems, KnowledgeBased Systems 65 (2014) 96–105.
 (32) L. Xia, C.C. Chen, J. Aggarwal, View invariant human action recognition using histograms of 3d joints, in: Computer Vision and Pattern Recognition Workshops (CVPRW), 2012 IEEE Computer Society Conference on, IEEE, 2012, pp. 20–27.
 (33) J. Wang, Y. Zhou, K. Duan, J. J.Y. Wang, H. Bensmail, Supervised crossmodal factor analysis for multiple modal data classification, in: Systems, Man and Cybernetics (SMC), 2015 IEEE International Conference on, IEEE, 2015.
 (34) J. Wang, Y. Zhou, H. Wang, X. Yang, F. Yang, A. Peterson, Image tag completion by local learning, in: Advances in Neural Networks–ISNN 2015, Springer, 2015.