Principal Model Analysis Based on Partial Least Squares
Abstract
Motivated by the Bagging Partial Least Squares (PLS) and Principal Component Analysis (PCA) algorithms, we propose a Principal Model Analysis (PMA) method in this paper. In the proposed PMA algorithm, the PCA and the PLS are combined. In the method, multiple PLS models are trained on subtraining sets, derived from the original training set based on the random sampling with replacement method. The regression coefficients of all the subPLS models are fused in a joint regression coefficient matrix. The final projection direction is then estimated by performing the PCA on the joint regression coefficient matrix. The proposed PMA method is compared with other traditional dimension reduction methods, such as PLS, Bagging PLS, Linear discriminant analysis (LDA) and PLSLDA. Experimental results on six public datasets show that our proposed method can achieve better classification performance and is usually more stable.
Keywords:
Principal model analysis partial least squares principal component analysis dimension reduction ensemble learning∎
1 Introduction
For qualitative analysis, highdimensional datasets provide enough information, but in many cases, not all the measured variables are useful for qualitative model. In addition, traditional statistical methods require the number of variables smaller than the number of samples, otherwise, it will cause the curse of dimensionality 1 (). In order to solve these problems, we need to reduce the dimensionality of the dataset before qualitative analysis. Dimension reduction methods such as PCA 2 (); 3 (); 4 (), LDA 5 () and PLS 6 (); 7 () are often used. These methods reduce or eliminate the statistical redundancy and noise between the components of highdimensional vector data, obtaining a lowerdimensional representation without significant loss of information.
In unsupervised data analysis, PCA is a good tool of dimension reduction, the main idea is to reduce the dimensionality of a dataset in which there are a large number of interrelated variables while retaining as much as possible of the variation present in the dataset 8 (). However, PCA can only work in the unsupervised dataset. After adding the sample labels, we need to use supervised methods for analyzing the dataset. LDA is a wellknown supervised method for feature extraction and dimension reduction, it achieves maximum discrimination by maximizing the ratio of betweenclass and withinclass distance 9 (). An intrinsic limitation of classical LDA is the socalled small sample size problem 5 (), different methods have been proposed to solve this problem 10 (); 11 (); 12 (). One of the most successful approaches is subspace LDA, which applies an intermediate dimension reduction stage before LDA. Among all the subspace LDA methods, the PCA plus LDA (PCALDA 13 ()) and PLS plus LDA (PLSLDA 14 ()) have received significant attention. Other approaches use the algorithms based on PLS as a dimension reduction.
PLS algorithm has the ability to overcome both the dimensionality and the collinear problems 15 (); 16 (), at the same time, and has exhibited excellent performance for solving the problem of small sample size 17 (). However, PLS also has some problems, such as how to obtain more useful information, to enhance the robustness of the model, and to more accurately eliminate redundancy and noise. A solution to these problems is ensemble learning which is derived from the field of machine learning 18 (), and can be used for both classification and regression problems. In this study, we are more interested in dimensionality reduction and classification. Compared with the single model, ensemble models, including boosting 20 (); 21 (), bagging 22 () and stacked regression 23 (); 31 (), report increased robustness and accuracy 19 () and have been successfully applied in the last several years. In order to overcome the overfitting problem, Zhang et al. used the idea of boosting to combine a set of shrunken PLS models, each with only one PLS component, and called it boosting PLS 20 (). On the basis of boosting PLS, some scholars have modified and applied it for spectroscopic quantitative analysis 25 (); 26 (). By using Bagging strategy 27 (); 28 (), many training sets are generated from the original dataset, Bagging PLS trains a model from each of those training sets, the final model can be obtained by averaging the coefficient B from each submodel. From overcoming the disadvantages of MWPLS and iPLS, Xu et al. presented a stack based PLS method using Monte Carlo Crossvalidation 29 (). Ni et al. have proposed two new stacked PLS which can establish PLS models based on all intervals of a set of spectra to take advantage of the information from the whole spectrum by incorporating parallel models in a way to emphasize intervals highly related to the target property 23 ().
After the establishment of the PLS submodels, various ensemble algorithms for the fusion of the final model are available, mainly including average weighting, crossvalidation error weighting and minimum square error weighting rule and so on. In this paper, for adopting Bagging model training method, the dataset is divided into a number of subtraining sets. The PLS models are then employed on these subtraining sets. Subsequently, the coefficients B of all the PLS submodels becoming an asymmetric positive semidefinite matrix BB, are fused in a joint matrix. Finally, using the PCA, an eigenvalue decomposition by taking the largest variance model or final projection model is performed. This proposed method is termed as the Principal Model Analysis (PMA). In the subsequent sections, we discussed the relationship between the model parameters (the number of latents, models and remained dimensions) and the classification accuracy. The theory and experiments show that PMA increases the robustness and the generalization ability of the PLS algorithm. Also, PMA can provide a good idea for using the PLS algorithm to semisupervised dimensionality reduction.
2 Background
2.1 Notation
Boldface uppercase and lowercase letters are used to denote matrices and vectors, respectively. Lowercase italic letters denote the scalars. The detailed notations are as follows:
X matrix of samples
y vector of sample label
C covariance matrix of PCA
w vector of the PCA loading
B matrix of PLS regression coefficients
vector of PLS1 regression coefficient
the eigenvalue of PCA
number of samples
number of sample features
number of components
2.2 Overview of PLS and Baggingbased PLS
PLS intends to project the highdimensional predictor variables into a smaller set of latent variables, which has a maximal covariance to the responses. Given a training set , the decomposition of PLS algorithm is as follows 36 (); 37 (); 38 (); 40 ():
where and are score vectors, and and are loading vectors of X and y, respectively. E and F are residuals matrices. The is the number of feature vectors. The PLS inner relation between the projected score vectors is:
The detailed algorithm procedures of PLS are as follows:

Initialization:

Computing weight vector: , and making to be normalization.

Computing the input’s score vector: and its loading vector: .

Computing the output’s loading vector: , and making to be normalization.

Computing the output’s score vector: .

Computing internal regression coefficient: .

Computing residuals matrices: and .

Updating to , then go back the step 2 until the expected number of latent variables is achieved.
The ensemble learning method aims to improve the accuracy and robustness of traditional algorithms by combining the results of multiple submodels. Bagging is a simple ensemble learning strategy and is widely used for the classification and regression problems, such as bagging SVM and bagging PLS.
The general PLS method usually shows bad or unstable results on the data with a very large number of collinear xvariables or the data with very limited training samples. By using the bagging strategy, the bagging PLS model could reduce the variance of the original unstable model without increasing the bias. Therefore, bagging PLS usually can achieve much more accurate and stable results than traditional PLS method.
Baggingbased PLS first generates several subtraining sets from the original training set based on the random sampling with replacement method, and then trains a PLS model on each subtraining set separately, finally averages the regression coefficients of all subPLS models and uses the averaged regression coefficient for the model prediction. In detail, we suppose that subtraining sets are generated by random sampling with replacement, and the PLS regression coefficient vector corresponding to each subtraining set is . The final regression coefficient of baggingbased PLS can be formulated as:
(1) 
2.3 Overview of PCA
Principal component analysis (PCA) uses an orthogonal transformation to convert a number of possibly correlated variables into a smaller number of uncorrelated variables called principal components while trying to preserve the data variance. Given a data matrix X, computing the covariance matrix C, then the projection directions of PCA can be solved by:
(2) 
The above problem can be easily solved by the eigendecomposition methods, such as the singularvalue decomposition (SVD) algorithm. The detailed algorithmic processes of PCA is as follows 41 (); 42 (); 43 (); 44 ():

Data standardization:
where is the data matrix with samples and variables.

Computing the covariance matrix: .

Eigendecomposition: .

Denote the first eigenvalues as , their corresponding eigenvectors, , are principal components. The number of principal components can be decided by the cumulative contribution rate of the principal components, i.e., choosing such that
3 Principal Model Analysis
3.1 Theory and Algorithm
Combing the bagging strategy and PLS, we propose a principal model analysis (PMA) method in this section. The proposed PMA contains two steps. The first step is also to generate subtraining sets from the original training set with replacement method and the corresponding PLS regression coefficient vector of each subtraining set is denoted by . Different from the bagging PLS method which just simply averages the PLS submodels, the second step adopted here is to use the PLS submodels as the input of PCA algorithm to generate the final PMA model. It is mainly because PCA can effectively find the “major” elements, remove the noise, and reveal the essential structure hidden behind the complex data.
The original PCA algorithm is performed by decomposing the covariance matrix C, which is a symmetric and positive semidefinite matrix. However, the whole regression coefficient matrix in the PMA algorithm is not a symmetric and positive semidefinite matrix. So, we need to make the regression coefficient matrix B to be a symmetric positive semidefinite matrix. We replace the B by BB for the eigenvalue decomposition, and get the most representative models which called principal model as the final PMA model. The optimization of PMA algorithm can be expressed as:
(3) 
The above problem can be easily solved by the singularvalue decomposition (SVD) algorithm:

Eigendecomposition: .

Denote the first eigenvalues as , their corresponding eigenvectors, , are principal components. The number of principal components can be decided by the cumulative contribution rate of the principal components, i.e., choosing such that
The detailed processes of PMA method are shown as follows.
Algorithm PMA
Input: Training set and corresponding label vector, the number of PLS latent variables,
the number of submodels, the number of principal models (dim).
Output: The projection direction of PMA.
1. Preprocessing the training set .
2. Dividing using random sampling with replacement and generating PLS submodels .
3. Denote , and doing the eigenvalue decomposition in (3), sorting the eigenvalues in descending order and rearranging their corresponding eigenvectors.
4. Denote the rearranged eigenvector matrix as W, outputting the final PMA model .
3.2 Determination of the number of Latent variables
The number of latent variables is an important parameter in the PLS model. There are many approaches to determine the number of latent variables, such as genetic algorithm, Ftest and crossvalidation methods. Crossvalidation methods include Kfold crossvalidation (kCV), leaveoneout crossvalidation (LOOCV), Monte Carlo crossvalidation (MCCV) and so on. In this paper, we use 10fold crossvalidation method to determine the number of latent variables.
3.3 The submodels selective rule
For ensemble strategy, usually submodels who performed better or part of the performance can include more diversity 27 (). So Zhou et al. suggested that it may be a better choice for using part of submodels instead of all of the submodels 32 (). Herein, original training set is arbitrarily divided into tress parts: calibration sets, validation sets and prediction sets 45 (). We establish 100 PLS submodels in the validation sets by subsampling and reweighting the existed calibration samples respectively. The proposed method directly constructs diverse models with virtual samples which are produced by original calibration samples, and this can increase the amount of ensemble diversity when the calibration samples are not enough 45 (). Using these 100 submodels on the validation set, we get 100 different classification accuracies. Then follow the classification accuracy in descending order, take the submodels with largest classification accuracy participate final ensemble.
3.4 Determination of Dimensions
PCA does an EIG or SVD on a matrix and then generates an eigenvalue matrix. To select the principal components we have to take only the first few eigenvalues. How do we decide on the number of eigenvalues that we should take from the eigenvalue matrix? Usually we adopt accumulative contribution rate automatically retain useful eigenvalues.
Using PMA to reduce dimension is to obtain the scores by projecting the new samples to the direction of the principal models, so the number of the final dimensions is equal to the number of selected principal models. In the experiment, if the parameter of fixed dimensions, which is one of inputs, is greater than or equal to 1, we will use fixed dimensions to select the principal models. Otherwise, if the fixed dimensions greater than 0 and less than 1, we use cumulative contribution rate to obtain the principal models.
From the selection of submodels can be inferred, the number of final principal models does not need much. Because the classification ability of the selected submodels are almost the same, so only one principal model almost retains all submodels classification ability. Therefore, in practical applications, we only take the eigenvector with the largest eigenvalue as the principal model.
4 Experimental results
4.1 Data Sets
In order to evaluate the performance of the proposed PMA method, we compare it with the PLS, LDA, PLSLDA and Bagging PLS methods on three types of data sets:

Four UCI datasets, i.e., Breast data, Spambase data, Gas data, Musk data (Version 1) (obtained from http://archive.ics.uci.edu/ml/datasets.html);

Small data and Imbalanced data;

Raman spectral data (Raman).
The details of these datasets are shown in Table 1. Before using the datasets, we remove the nonnumerical and missing inputs data and convert the class label to a numeric type.
Data Set  Number of Examples  Number of Attributes  Class label  Year 

Breast  569  30  1 and 2  1995 
Spambase  4601  57  0 and 1  1999 
Gas  4782  128  5 and 6  2012 
Musk(Version 1)  168  476  0 and 1  1994 
small  300  476  0 and 1  1994 
imbalanced  7074  476  0 and 1  1994 
Raman  925  101  0 and 4  N/A 
The data sets “small” and “imbalanced” are randomly sampled from the data set “Musk (Version 1)”. The data set “small” is a typical data set with high dimensionality and small samples, where the number of positive and negative samples are the same. The data set “imbalanced” is an imbalanced data set, where the ratio of positive and negative samples is 6:1.
Spectral data set “Raman” is obtained by a standard Raman spectroscope (HR LabRam invers, JobinYvonHoriba). The excitation wavelength of the used laser (Nd: YAG, frequency doubled) is centered at 532 nm. There are 2545 spectra for 20 different strains available 34 (). Herein we select two classes (B. subtilis DSM 10 and M. luteus DSM 348) and use the spectra in the region 11001200 in calculations 35 ().
4.2 Calculation
Five dimension reduction methods, i.e., PLS, LDA, PLSLDA, Bagging PLS and PMA, are compared in our experiments. For Bagging PLS algorithm, fifteen models are generated by the random sampling with replacement method and the final model is obtained by averaging these fifteen submodels. For the PMA method, 100 submodels are generated from the validation set, and the best fifteen submodels with higher accuracies are chosen to perform model fusion. Except for the LDA, the number of latent variables in the PLS, PLSLDA, Bagging PLS and PMA are determined by the 10fold crossvalidation. In the experiment, the dimensionality of the original data is reduced to 1. For fair comparison, the linear Naive Bayes classifier is used to evaluate the results of the above different dimension reduction methods.
For each data set, we randomly choose 49, 30 and 21 samples from the total samples to form the training set, test set, and validation set. The experiments are randomly run 20 times, and the averaged results are recorded.
5 Results and Discussion
5.1 Classification performance of different algorithms
This section mainly investigates the classification performance of various algorithms. We report the results on both the training and testing datasets. The classification accuracies accuracies are reported in Table 2 and Table 3, respectively.
Data Set  PLS  LDA  PLSLDA  Bagging PLS  PMA 

Breast  0.9183  0.6750  0.8910  0.9307  0.9632 
Spambase  0.8043  0.7763  0.6816  0.8543  0.9070 
Gas  0.9704  0.9715  0.7920  0.9703  0.9740 
Musk (Version 1)  0.9119  0.7176  0.9039  0.9126  0.9176 
small  0.9249  0.9037  0.9299  0.9481  0.9858 
imbalanced  0.9726  1.0000  0.9619  0.9769  0.9899 
Raman  0.9515  0.7654  0.8508  0.9538  0.9579 
The bold value means the maximum accuracy among different methods.
Data Set  PLS  LDA  PLSLDA  Bagging PLS  PMA 

Breast  0.9143  0.6428  0.8891  0.9265  0.9545 
Spambase  0.8025  0.7607  0.6820  0.8522  0.9034 
Gas  0.9694  0.9623  0.7920  0.9694  0.9730 
Musk (Version 1)  0.9052  0.7003  0.8980  0.9059  0.9108 
small  0.7108  0.6136  0.7128  0.7215  0.7220 
imbalanced  0.9003  0.6492  0.8821  0.9091  0.9097 
Raman  0.9345  0.6589  0.8400  0.9367  0.9419 
The bold value means the maximum accuracy among different methods.
The smallsamplesize problem is often encountered in the field of pattern recognition. It may lead to the singularity of the withinclass scatter matrix in the LDA. So, for the data sets “small” and “imbalanced”, the LDA algorithm shows bad results. PLS shows good overall classification performance. PLSLDA algorithm firstly removes redundancy and noise in the data set by the PLS method, then performs the LDA algorithm on the PLS dimension reduction features. PLSLDA shows better results than LDA except for the data sets “Spambase” and “Gas”. But PLSLDA still seems to show overfitting phenomenon in the data sets “small” and “imbalanced”. Because the PLS dimension reduction process may lose some information, the results of PLSLDA are worse than PLS. The Bagging PLS achieves better results than PLS in the data sets “Breast”, “Spambase”, “Raman” and “Muskv (Version 1)”. Although many submodels in Bagging PLS provide better performance than PLS, the improvement of Bagging PLS over PLS is not significant because of the average strategy. As observed in the Tables 2 and 3, all algorithms appear overfitting phenomenon on the data set “small”. Notwithstanding, the proposed PMA algorithm achieves the best results in either training and testing set except for the data set “imbalanced”. The superiorities are much more obvious on the data sets “Breast” and “Spambase” .
The above figures are the box diagrams of the accuracy of different algorithms. For the data sets “Breast”, “Spambase”, “Raman” and “Muskv (Version 1)”, the results of LDA algorithm are obviously much worse than other methods. The results of PLS are also unstable on the data set “Spambase”. The results of PLSLDA are worse than others on the data sets “Spambase” and “Gas”. In the data set “small”, all algorithms show overfitting phenomenon. Except for the data set “small”, PMA algorithm gets more stable results than other methods.
5.2 Investigation on the number of submodels
From the Figure 8, we can see that the number of submodels is less sensitive to the PMA model. In general, the classification accuracies on each data set decreases with the increase of the number of submodels. It demonstrates that not all of the submodels are valid. Meanwhile, it is likely to improve the classification performance by choosing some good submodels. For the data sets “Breast” and “Raman”, the number of submodels greatly affects the classification results. The number of submodels can be empirically determined by the crossvalidation method.
5.3 Impact of PMA dimensions
To investigate the effect of PMA dimensionalities, we show the classification results on different dimensionalities ranging from 1 to 30. As can be seen from the Figure 9, the classification accuracy on all data sets does not improve with the increase of dimensionality. A possible explanation could be that the first principal component already contains the majority information of the entire data. The results on the data sets “Gas” and “Muskv1” are relatively stable.
5.4 Discussions of the proposed method
The proposed PMA algorithm extends the original Bagging PLS for qualitative analysis. The results on the six data sets show that PMA algorithm can improve the classification accuracy to a certain extent. Model ensemble has many advantages, such as enhancing the robustness. However, the number of submodels and the number of dimensionalities must be carefully chosen.
6 Conclusions
In this paper, we have proposed a PMA method for classification. By means of ensemble strategy, the proposed PMA method fuses the results of PLS submodels and finds the principal model by performing PCA on the joint coefficient matrix of all submodels. Experimental results demonstrate that the proposed PMA method can achieve better classification performance than original PLS and Bagging PLS. Our future work will focus on finding more comprehensive evaluation criteria for the selection of submodels. In addition, we will perform PMA on semisupervised problems by adding a large number of unsupervised data.
References
 (1) Afara, I., Singh, S., Oloyede, A.: Application of near infrared (nir) spectroscopy for determining the thickness of articular cartilage. Medical Engineering and Physics 35(1), 88–95 (2013)
 (2) Barker, M., Rayens, W.: Partial least squares for discrimination. Journal of Chemometrics 30(3), 446–452 (2012)
 (3) Bellman, R.: Adaptive Control Processes: A Guided Tour. The University Press (1961)
 (4) Bi, Y., Xie, Q., Peng, S., Tang, L., Hu, Y., Tan, J., Zhao, Y., Li, C.: Dual stacked partial least squares for analysis of nearinfrared spectra. Analytica Chimica Acta 792(16), 19–27 (2013)
 (5) Bian, X., Li, S., Shao, X., Liu, P.: Variable space boosting partial least squares for multivariate calibration of nearinfrared spectroscopy. Chemometrics and Intelligent Laboratory Systems 158 (2016)
 (6) Boulesteix, A.L.: Pls dimension reduction for classification with microarray data. Statistical Applications in Genetics and Molecular Biology 3(1), 392 (2004)
 (7) Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)
 (8) Chen, L.F., Liao, H.Y.M., Ko, M.T., Lin, J.C., Yu, G.J.: A new ldabased face recognition system which can solve the small sample size problem. Pattern Recognition 33(10), 1713–1726 (2000)
 (9) Chiang, K.Y., Hsieh, C.J., Dhillon, I.S.: Robust principal component analysis with side information. In: International Conference on Machine Learning, pp. 2291–2299 (2016)
 (10) Efron, Bradley: An introduction to the bootstrap. Chapman and Hall (1995)
 (11) Ferrari, A.C., Robertson, J.: Interpretation of raman spectra of disordered and amorphous carbon. Physical Review B 61(20), 14,095–14,107 (2000)
 (12) FolchFortuny, A., Arteaga, F., Ferrer, A.: Pls model building with missing data: New algorithms and a comparative study. Journal of Chemometrics 31(12) (2017)
 (13) Ginkel, J.R.V., Kroonenberg, P.M.: Using generalized procrustes analysis for multiple imputation in principal component analysis. Journal of Classification 31(2), 242–269 (2014)
 (14) Goodhue, D.L., Lewis, W., Thompson, R.: Does pls have advantages for small sample size or nonnormal data? Mis Quarterly 36(3), 981–1001 (2012)
 (15) Hu, Y., Peng, S., Peng, J., Wei, J.: An improved ensemble partial least squares for analysis of nearinfrared spectra. Talanta 94, 301–307 (2012)
 (16) Huang, R., Liu, Q., Lu, H., Ma, S.: Solving the small sample size problem of lda 3, 29–32 (2002)
 (17) Jolliffe, I.T., Cadima, J.: Principal component analysis: a review and recent developments. Philosophical Transactions 374(2065), 20150,202 (2016)
 (18) Kambhatla, N., Leen, T.: Dimension reduction by local principal component analysis. Neural Computation 9(7), 1493–1516 (1997)
 (19) Liu, Y., Rayens, W.: Pls and dimension reduction for classification. Computational Statistics 22(2), 189–208 (2007)
 (20) Long, C., Guizeng, W.: Soft sensing based on pls with iterated bagging method. Journal of Tsinghua University 48, 86–90 (2008)
 (21) Maclin, R., Opitz, D.: Popular ensemble methods: An empirical study. Journal of Artificial Intelligence Research 11, 169–198 (2011)
 (22) Marigheto, N.A., Kemsley, E.K., Defernez, M., Wilson, R.H.: A comparison of midinfrared and raman spectroscopies for the authentication of edible oils. Journal of the American Oil Chemists’ Society 75(8), 987–992 (1998)
 (23) Martinez, A.M., Kak, A.C.: Pca versus lda. IEEE Transactions on Pattern Analysis and Machine Intelligence 23(2), 228–233 (2001)
 (24) MendesMoreira J Soares C, J.A.M.: Ensemble approaches for regression: A survey. ACM Computing Surveys 45(1), 10 (2011)
 (25) Montanari, A.: Linear discriminant analysis and transvariation. Journal of Classification 21(1), 71–88 (2004)
 (26) Ni, W., Brown, S.D., Man, R.: Stacked partial least squares regression analysis for spectral calibration and prediction. Journal of Chemometrics 23(10), 505–517 (2010)
 (27) Peschke, K.D., Haasdonk, B., Ronneberger, O., Burkhardt, H., Harz, M.: Using transformation knowledge for the classification of raman spectra of biological samples. In: Proceedings of the 4th Iasted International Conference on Biomedical Engineering, pp. 288–293 (2006)
 (28) Qin, X., Gao, F., Chen, G.: Wastewater quality monitoring system using sensor fusion and machine learning techniques. Water Research 46(4), 1133–1144 (2012)
 (29) Ren, D., Qu, F., Lv, K., Zhang, Z., Xu, H., Wang, X.: A gradient descent boosting spectrum modeling method based on back interval partial least squares. Neurocomputing 171(C), 1038–1046 (2012)
 (30) Shao, X., Bian, X., Cai, W.: An improved boosting partial least squares method for nearinfrared spectroscopic quantitative analysis. Analytica Chimica Acta 666(12), 32–37 (2010)
 (31) ShaoHong, G.U., Wang, Y.S., Wang, G.X.: Application of principal component analysis model in data processing. Journal of Surveying and mapping 24(5), 387–390 (2007)
 (32) Tan, Chao, Wang, Jinyue, Wu, Tong, Qin, Xin, Li, Menglong: Determination of nicotine in tobacco samples by nearinfrared spectroscopy and boosting partial least squares. Vibrational Spectroscopy 54(1), 35–41 (2010)
 (33) Trendafilov, N.T., Unkel, S., Krzanowski, W.: Exploratory factor and principal component analyses: some new aspects. Kluwer Academic Publishers (2013)
 (34) Wold, S., Esbensen, K., Geladi, P.: Principal component analysis. Chemometrics and Intelligent Laboratory Systems 2(1), 37–52 (1987)
 (35) Wold, S., Sjostrom, M., Eriksson, L.: Plsregression: a basic tool of chemometrics. Chemometrics and Intelligent Laboratory Systems 58(2), 109–130 (2001)
 (36) Xu, L., Jiang, J.H., Zhou, Y.P., Wu, H.L., Shen, G.L., Yu, R.Q.: Mccv stacked regression for model combination and fast spectral interval selection in multivariate calibration. Chemometrics and Intelligent Laboratory Systems 87(2), 226–230 (2007)
 (37) Ye, J.: Least squares linear discriminant analysis. In: Proceedings of the 24 International Conference on Machine Learning, pp. 1087–1093 (2007)
 (38) Ye, J., R, J., Q, L.: Twodimensional linear discriminant analysis. Advances in Neural Information Processing Systems pp. 1431–1441 (2005)
 (39) Zhang, M.H., Xu, Q.S., Massart, D.L.: Boosting partial least squares. Analytical Chemistry 77(5), 1423–1431 (2005)
 (40) Zheng, W., Zhao, L., Zou, C.: An efficient algorithm to solve the small sample size problem for lda. Pattern Recognition 37(5), 1077–1079 (2004)
 (41) Zhou, Z.H., Wu, J., Tang, W.: Ensembling neural networks: Many could be better than all. Artificial intelligence 137(1), 239–263 (2002)