Random Subspace Twodimensional LDA for Face Recognition*
Abstract
In this paper, a novel technique named random subspace twodimensional LDA (RS2DLDA) is developed for face recognition. This approach offers a number of improvements over the random subspace twodimensional PCA (RS2DPCA) framework introduced by Nguyen et al. [5]. Firstly, the eigenvectors from 2DLDA have more discriminative power than those from 2DPCA, resulting in higher accuracy for the RS2DLDA method over RS2DPCA. Various distance metrics are evaluated, and a weighting scheme is developed to further boost accuracy. A series of experiments on the MORPHII and ORL datasets are conducted to demonstrate the effectiveness of this approach.
I Introduction
Face recognition has numerous applications in surveillance and authentication systems, yet it remains a difficult problem. Faces of different people can appear very similar, while images of one person are often quite different. Many approaches have been tested for face recognition. In recent years, twodimensional variants of wellknown feature extraction methods such as principal component analysis (PCA) and linear discriminant analysis (LDA) have received growing attention. They generally achieve higher accuracy and are computationally efficient because they require fewer coefficients for image representation. Encouraged by the work in [5], in which the random subspace method is applied to twodimensional PCA, we propose a new algorithm and evaluate its performance on the MORPHII and ORL datasets.
The remainder of this paper is organized as follows: Section 2 gives a summary of 2DPCA and its variants; Section 3 does the same for 2DLDA; in Section 4 we review the random subspace method and its previous application to 2DPCA; Section 5 is dedicated to our new approach, RS2DLDA; experiments are conducted in Section 6; we conclude in Section 7.
Ii TwoDimensional Principal Component Analysis
Iia Introduction
Principal component analysis (PCA) is a widely used feature extraction and dimension reduction technique. In PCAbased face recognition, twodimensional image matrices must first be transformed into onedimensional vectors. The vectorized images are usually of high dimension. This makes it difficult to calculate the covariance matrix accurately when there is a relatively small number of training samples. In [10], Yang et al. proposed twodimensional PCA (2DPCA), which differs from PCA in that the image matrices are not transformed into vectors. Instead, an image covariance matrix is constructed from the original image matrices. The main advantage of 2DPCA over PCA is that the size of the image covariance matrix is much smaller. As a result, it is easier to evaluate it accurately, and it is less computationally expensive to determine the corresponding eigenvectors.
IiB Right 2DPCA
Right 2DPCA (R2DPCA) is simply 2DPCA as originally proposed by Yang et al. in [10]. We refer to it as Right 2DPCA to distinguish it from other generalizations of 2DPCA.
IiB1 Algorithm
Let be a collection of image matrices of dimension , where represents the th image matrix, . We wish to project each image matrix onto an dimensional vector , resulting in an dimensional projected feature vector .
(1) 
We choose such that the scatter of all projected feature vectors is maximized. Equivalently, we seek that maximizes the trace of the covariance matrix of the projected feature vectors . Thus, we wish to maximize
(2) 
where denotes the trace of .
(3)  
(4)  
(5) 
therefore,
(6) 
Define the matrix as
(7) 
From its definition, we know that is an positive semidefinite matrix. It can be evaluated directly using the training image matrices. Let the average image of all training images be denoted by , so that
(8) 
We can then approximate by where
(9) 
Equation (2) can instead be expressed as
(10) 
It has been shown that the vector which maximizes (10) is the eigenvector of corresponding to the largest eigenvalue. In general it is not enough to select just one vector for projection. Normally an orthonormal set of vectors are chosen. These are the eigenvectors of corresponding to the largest eigenvalues.
IiB2 Feature Extraction
The optimal projection vectors can be used for feature extraction. Let be an matrix. Then for a given image , let
(11) 
The set of projected feature vectors are called the principal component vectors of the image . In 2DPCA the principal components are vectors, not scalars. The principal component vectors can be used to form an feature matrix .
IiB3 Classification
Given two arbitrary feature matrices and , the distance between them can be calculated using the Frobenius norm . Other norms can also be considered. Once all pairwise distances between feature matrices have been calculated, a nearest neighbor (KNN) algorithm is used for classification.
IiB4 Image Reconstruction
IiC Left 2DPCA
Hong et al. showed in [4] that 2DPCA is equivalent to PCA if each row of an image matrix is considered as a computational unit. A natural extension would then be to consider each column of an image matrix as a computational unit. This is called Left 2DPCA (L2DPCA) because the images are projected by a left matrix multiplication as opposed to a right matrix multiplication in conventional 2DPCA. The algorithm is formulated in [11]. It is important to consider both Right and Left 2DPCA because the rows of an image may contain vital discriminatory information that is lacking in the columns, and viceversa. The algorithm for L2DPCA largely mimics that of R2DPCA, with a few small changes.
IiC1 Algorithm
Let be a collection of image matrices of dimension , where represents the th image matrix, . We wish to project each image matrix onto an dimensional vector , resulting in an dimensional projected feature vector .
(13) 
We choose such that the scatter of all projected feature vectors is maximized. Equivalently, we seek that maximizes the trace of the covariance matrix of the projected feature vectors . Thus, we wish to maximize
(14) 
where denotes the trace of .
(15)  
(16)  
(17) 
therefore,
(18) 
Define the matrix as
(19) 
From its definition, we know that is an nonnegative definite matrix. It can be evaluated directly using the training image matrices. Let the average image of all training images be denoted by as defined in (8). We can then approximate by , where
(20) 
Equation (14) can instead be expressed as
(21) 
It has been shown that the vector which maximizes (21) is the eigenvector of corresponding to the largest eigenvalue. In general it is not enough to select just one vector for projection. Normally an orthonormal set of vectors are chosen. These are the eigenvectors of corresponding to the largest eigenvalues.
IiC2 Feature Extraction
The optimal projection vectors can be used for feature extraction. Let be an matrix. Then for a given image , let
(22) 
The set of projected feature vectors are called the principal component vectors of the image . In 2DPCA the principal components are vectors, not scalars. The principal component vectors can be used to form a feature matrix .
IiC3 Classification
Classification here is equivalent to that in R2DPCA.
IiC4 Image Reconstruction
IiD Bilateral 2DPCA
One major limitation of R2DPCA and L2DPCA is that they each only consider information from either the rows or the columns of an image, but not both. Another drawback is that they require many coefficients for image representation. In R2DPCA, an image can only be reduced to , whereas L2DPCA can only reduce the same image to . Bilateral 2DPCA (B2DPCA) as proposed in [4] addresses both of these problems. It incorporates both row and column information from the images, and is able to reduce an image to , making it more computationally efficient.
IiD1 Algorithm
Given feature matrix from L2DPCA and feature matrix from R2DPCA, project image by the following transformation:
(24) 
where is of dimension . Similar to conventional 2DPCA, classification is done by KNN and a chosen distance metric. Experiments in [4] demonstrate the accuracy and efficiency of B2DPCA.
Iii TwoDimensional Linear Discriminant Analysis
Iiia Introduction
Conventional linear discriminant analysis (LDA) is a popular technique for feature extraction and dimension reduction. LDA seeks an optimal projection of the data so that variance between classes is maximized while the variance within classes is minimized. Because it is a supervised method, it often has advantages over PCA, especially in face recognition problems. When using LDA for face recognition, typically the twodimensional image matrices must first be converted to onedimensional vectors. This causes the betweenclass and withinclass scatter matrices to be of high dimension, making them difficult to accurately calculate. Furthermore, LDA requires that at least one of the matrices be invertible. However, they are both highdimensional, and in practice the number of samples is relatively small. This all but guarantees that both matrices will be singular. This is known as the small sample size (SSS) problem.
An extension of 2DPCA, twodimensional LDA (2DLDA) as proposed in [9], avoids the SSS problem. The betweenclass and withinclass scatter matrices are calculated directly from the original image matrices. Thus, their dimension is much smaller, making them easy to compute accurately. In practice one has enough data to guarantee that they are not singular, and it is more computationally efficient to find the desired eigenvectors.
IiiB Right 2DLDA
As with R2DPCA, Right 2DLDA (R2DLDA) is equivalent to conventional 2DLDA. We refer to it as R2DLDA to distinguish it from other generalizations of 2DLDA.
IiiB1 Algorithm
Let be a collection of image matrices of dimension , where represents the th image matrix, . Each image belongs to one of classes, where the th class has samples and . R2DLDA transforms all images by a set of discriminating vectors resulting in projected image matrices
(25) 
is and its columns are chosen to maximize the 2D Fisher criterion
(26) 
where and represent the betweenclass and withinclass scatter matrices of Right 2DLDA, respectively. Let denote the average image of the th class, and the average image of all images. It follows that
(27) 
and
(28) 
It has been shown that the vectors which maximize (26) are the eigenvectors of corresponding to the largest eigenvalues.
IiiB2 Feature Extraction
The optimal projection vectors can be used for feature extraction. Let be an matrix. Then for a given image , let
(29) 
The set of projected feature vectors are called the Right Fisher feature vectors of the image . The Right Fisher feature vectors can be used to form an Fisher feature matrix . Classification is done with KNN and a chosen distance metric.
IiiC Left 2DLDA
Similar to the analysis of 2DPCA in [4], it can be seen that R2DLDA operates on information contained in the rows of image matrices. There may be different discriminatory information contained in the columns, thus, a natural extension of R2DLDA is Left 2DLDA (L2DLDA). The framework for L2DLDA is given in [6].
IiiC1 Algorithm
Let be a collection of image matrices of dimension , where represents the th image matrix, . Each image belongs to one of classes, where the th class has samples . L2DLDA transforms all images by a set of discriminating vectors resulting in projected image matrices
(30) 
is and its columns are chosen to maximize the 2D Fisher criterion
(31) 
where and represent the betweenclass and withinclass scatter matrices of Left 2DLDA, respectively. Let denote the average image of the th class, and the average image of all images. It follows that
(32) 
and
(33) 
It has been shown that the vectors which maximize (31) are the eigenvectors of corresponding to the largest eigenvalues.
IiiC2 Feature Extraction
The optimal projection vectors can be used for feature extraction. Let be an matrix. Then for a given image , let
(34) 
The set of projected feature vectors are called the Left Fisher feature vectors of the image . The Left Fisher feature vectors can be used to form a Fisher feature matrix . Classification is done with KNN and a chosen distance metric.
IiiD Bilateral 2DLDA
R2DLDA and L2DLDA suffer from the same limitations as R2DPCA and L2DPCA. Bilateral 2DLDA (B2DLDA) as proposed in [6] addresses these shortcomings. It incorporates both row and column information from the images, and is able to reduce an image to , making it more computationally efficient.
IiiD1 Algorithm
Given feature matrix from L2DLDA and feature matrix from R2DLDA, project image by the following transformation:
(35) 
where is of dimension . Similar to conventional 2DLDA, classification is done by KNN and a chosen distance metric. Experiments in [6] demonstrate the accuracy and efficiency of B2DLDA.
Iv Random Subspace Method
Iva Overview
In ensemble learning one attempts to train a set of diverse classifiers whose individual outputs are combined into one final decision. If the classifiers are diverse, that is, if they each make different mistakes, then the hope is that through a sensible combination of the classifiers’ decisions that those individual errors will be corrected. Many techniques for training diverse classifiers exist. Bootstrap aggregating (bagging) is a popular method. In bagging, each classifier is trained on a random subset of the training data in order to promote model variance. The random subspace method [2] is similar to bagging, but instead of training each model on a random subset of the training data, each model is trained on random samples of features instead of the entire feature set. This causes individual classifiers to not overfocus on features that appear highly predictive in the training set. It is generally used with decision trees, though it has been applied to other areas as well.
IvB Application to 2DPCA
In [5], Nguyen et al. proposed random subspace twodimensional PCA (RS2DPCA). To our knowledge, this is the only application of the random subspace method to any of the twodimensional variants of PCA, LDA, etc. In their paper they note that the accuracy of 2DPCA depends heavily on , the number of eigenvectors kept. Choosing too low results in poor accuracy, while choosing large can easily cause overfitting to the training data. Generally, the eigenvectors corresponding to the largest eigenvalues are kept. However, the eigenvectors that are discarded still contain valuable information. To overcome these limitations, random samples of the eigenvectors are used to build many classifiers. As shown in [5], this results in improved and more stable accuracy, and makes it possible to utilize all of the eigenvectors without risk of overfitting. Unfortunately, the Nguyen et al. approach does not perform well on difficult datasets. In the next section, we propose our algorithm and demonstrate its improvements over RS2DPCA.
V Random Subspace TwoDimensional Linear Discriminant Analysis
Va Introduction
Motivated by RS2DPCA, we propose random subspace twodimensional LDA (RS2DLDA). Important advantages of RS2DLDA over RS2DPCA include:

The random subspace is a random sampling of eigenvectors from 2DLDA. These eigenvectors have been shown to have more discriminative power than those from 2DPCA.

Entropy measure is used to select parameters that result in diverse classifiers.

Each classifier’s reliability is estimated from an adjusted Rand index (ARI) based off of performance on the training data.

The ARI scores are used to develop a weighting scheme which is utilized in the final ensemble decision and further boosts accuracy.
Although we focus on applying the random subspace method to 2DLDA, it can be easily extended to 2DPCA, because it is merely a random sample of eigenvectors. We will give analysis of both to illustrate the superiority of 2DLDA over 2DPCA in face recognition. Note that our application of the random subspace method to 2DPCA is not equivalent to the approach by Nguyen et al. in [5]. They do not consider ways to measure classifier diversity nor does their model incorporate a weighting scheme.
VB Increasing Diversity
The advantage of ensemble systems over single classifiers is that the combination of outputs from many classifiers can often correct for the errors of individual classifiers. However, this only works when the classifiers are diverse, that is, when each classifier makes different mistakes. If all classifiers are essentially the same, we cannot hope to correct their individual errors. There exist many ways to measure classifier diversity. One such method is entropy measure, which assumes diversity is highest when half of the classifiers are correct for a given test image. Define as the number of classifiers out of that misclassify the th image. Then entropy is defined as
(36) 
where . Low values indicate similar classifiers, and high values indicate diverse classifiers. If we can choose parameters that yield a highly diverse set of classifiers, then it is more likely that we will be able to increase the final accuracy with an intelligent combination of the classifier outputs.
We can see in Fig. 4 that entropy varies with the number of random eigenvectors we select. If we don’t choose enough eigenvectors, then each classifier is not predictive enough and performs poorly. On the other hand, if we choose too many, the classifiers are all very similar. Choosing a moderate value (10 in this case) works well to increase classifier diversity.
Entropy is also affected by the number of random classifiers we train. Not training enough results in poor diversity. Training more classifiers will increase entropy, but to a point. Fig. 4 illustrates this trend.
VC Estimating Classifier Credibility
In RS2DLDA, each classifier is a random sampling of eigenvectors from 2DLDA. Together these eigenvectors form a matrix. All images are multiplied by this matrix, which projects them to a new space. It is impossible to know how well a classifier will perform on the testing data, but we can get an idea based on its performance on the training data. Since each classifier defines a projection to a new space, we can expect that the classifiers which project images of the same person close to each other but images of different people far apart will perform well on the testing data. On the other hand, if the projected images of different people are mixed together, and there are no clear boundaries separating the images of one person from the next, then we will expect the classifier to perform poorly.
For a given classifier, take a training image and find its predicted class using KNN on the remainder of the training set. Do this for all training images, obtaining a prediction for each training image. Let this set of predictions define a clustering of the training images. Let the ground truth values define another clustering of the training images. We can expect the classifier to perform well if is similar to . That is, we can use a clustering similarity measure to evaluate whether the projection defined by the random sample of eigenvectors appears to preserve or disregard class differences. The adjusted Rand index [3] is one such clustering similarity measure that fits this purpose well.
Given , a set of images and two clusterings of these images, namely , the ground truth identities of the images and , the predicted identities from a given classifier, the overlap between and can be summarized in a contingency table , where each entry is the number of images in common between and , .
Sums  

Sums 
The adjusted Rand index (ARI) is then calculated as
(37) 
ARI ranges from to . High values indicate similar clusterings, while low values mean the clusterings are dissimilar. The adjusted Rand index is a correctedforchance version of the Rand index. It includes negative values which indicate a Rand index that is less than would be expected if the clusterings were drawn randomly.
If a given classifier achieves a high ARI, we know in its projected space that images of the same person are clustered close together and images of different people are spread apart. Thus, we expect these classifiers to outperform those with a low ARI when applied to the testing data. To take advantage of this, we develop a weighting scheme to give classifiers with a high ARI more influence in the final decision.
In Fig. 5 one can see the distribution of ARI for a set of 50 random classifiers operating on a subset of MORPHII. The ARI are higher on average for cosine distance than for euclidean, suggesting cosine distance may be more suited to face recognition on MORPHII than euclidean. The ARI are all positive, indicating that there is some similarity between the classifiers’ predictions on the training set and the ground truth values. Although the ARI are quite low (the highest is less than 0.3) it is important to remember that in ensemble learning we combine many weak classifiers to build one strong classifier. Indeed, we could have chosen to sample twice as many eigenvectors and easily increased all ARI, but this would come at the expense of classifier diversity.
VD Weighted Majority Voting
In majority voting, each classifier gets one vote. The final decision is the class with the most votes, regardless of whether the percentage of votes is above . Define the decision of the th classifier as , where , and . is the number of classifiers and the number of classes. If the th classifier predicts the th person to belong to class , then , and otherwise . We choose class if
(38) 
If we know that some classifiers are more accurate than others, then we can weight their decisions so that more credible classifiers have a higher influence on the final decision. This is known as weighted majority voting. Assign weight to the th classifier. We choose class if
(39) 
We do not require the proportion of support to class to be over .
We expect the classifiers with a high ARI to be more accurate than those with a low ARI. But how much additional influence should we give to the strong classifiers? We need a monotone function which maps an ARI to a weight for each classifier. One simple solution to consider is raising each ARI to a common exponent . Choosing a low value for will mean the strong classifiers have marginally more influence than the weak classifiers. A high value for would make so that only the strongest classifiers make any real impact on the final decision. A moderate value for should give a proper balance and help to increase overall accuracy.
In Fig. 7 and Fig. 7 we train 50 random classifiers on a subset of MORPHII and experiment by varying , the degree to which each ARI is raised. We can see that a weighing scheme has the potential to substantially boost performance. However, clearly the best value for is different in Fig. 7 than in Fig. 7, though they are identical experiments, up to a different initial random seed. More research needs to be done on how to select an optimal value for . Other monotone functions, such as the logistic function, could also be considered.
VE Algorithm
Given the following parameters:
Param.  Description 

Set of all images, where denotes the th image  
Set of training images  
Set of testing images  
Total number of images  
Number of people (classes)  
Withinclass and betweenclass covariance matrices from L2DLDA  
Withinclass and betweenclass covariance matrices from R2DLDA  
Eigenvectors of  
Eigenvectors of  
Number of eigenvectors of and kept  
th projected image  
Number of classifiers  
Ground truth identities for all images  
Predicted training identities, where is the predicted identity of the th image  
Indicator: 1 if the th classifier predicts the th image to be from the th person, 0 otherwise  
ARI  Adjusted Rand index of the th classifier 
Exponent to which each ARI is raised, resulting in each classifier’s weight  
Weight given to the th classifier  
Final testing ensemble prediction, where is the predicted identity of the th image 
and functions:
Function  Description 

Returns the most common class among the nearest neighbors of in .  
Returns the adjusted Rand index of and . 
the algorithm for RS2DLDA can be summarized as follows:
To consider 2DPCA or a different projection scheme (bilateral, left, or right), simply replace the projection in step 5 of the algorithm. For example, to apply the random subspace method and weighting scheme to L2DPCA, step 5 would become , whereas for R2DLDA it would be .
Vi Experiments
Via Introduction to the Data
ViA1 MorphIi
The MORPHII dataset [8] is a longitudinal dataset collected over five years. It contains 55,134 images from 13,617 individuals. Subjects’ ages range from 1677 years of age, and there are an average of four images per person. MORPHII is a difficult dataset for face recognition because it suffers from high variability in pose, facial expression, and illumination. To account for this, all images were preprocessed. OpenCV was used to automatically detect the face and the eyes in each image. The images were rotated so that the eyes were horizontal, and then cropped to to reduce noise from the background or the subject’s hair. Finally, all images were histogram equalized with a builtin Python function to help account for the differences in illumination.
ViA2 Orl
The ORL dataset [1] contains 40 people each with 10 images of size . There are minor variations in lighting and facial expression, but it is an easy dataset for face recognition. No preprocessing was done on the images.
ViB Experiment Design
A subset of MORPHII was used for the experiments. Among those with 10+ images, 50 arbitrary people were selected. Five images per person (250 images total) were randomly selected for training, and five for testing (250 images total). 50 different 5nearest neighbor classifiers were created, each from a random sample of 10 eigenvectors. The classifiers’ predictions on the testing data were combined into one final decision by weighted majority voting. For ORL, the entire dataset was used. Five images per person (200 images total) were randomly selected for training, and five for testing (200 images total). 50 different 1nearest neighbor classifiers were created, each from a random sample of 5 eigenvectors. The classifiers’ predictions on the testing data were combined into one final decision by weighted majority voting. For completeness, bilateral, right, and left projection schemes of 2DLDA and 2DPCA are considered. All experiments were repeated thirty times and results averaged to obtain the results in tables I and II. Standard error is shown in parentheses.
Face Recognition on MORPHII  
Algorithm  Euclidean  Cosine  
Weighted  Unweighted  Original  Weighted  Unweighted  Original  
B2DLDA  .727 (.019)  .678 (.026)  .764  .781 (.018)  .786 (.015)  .768 
L2DLDA  .743 (.008)  .735 (.009)  .756  .788 (.010)  .780 (.012)  .776 
R2DLDA  .704 (.016)  .662 (.018)  .704  .723 (.018)  .733 (.016)  .704 
B2DPCA  .706 (.013)  .701 (.013)  .564  .702 (.018)  .692 (.013)  .556 
L2DPCA  .678 (.009)  .667 (.011)  .552  .670 (.018)  .660 (.016)  .544 
R2DPCA  .611 (.010)  .609* (.007)  .580  .609 (.009)  .612 (.009)  .584 
Face Recognition on ORL  
Algorithm  Euclidean  Cosine  
Weighted  Unweighted  Original  Weighted  Unweighted  Original  
B2DLDA  .931 (.017)  .924 (.017)  .935  .939 (.015)  .936 (.016)  .940 
L2DLDA  .914 (.013)  .909 (.014)  .940  .937 (.016)  .935 (.017)  .940 
R2DLDA  .929 (.013)  .923 (.016)  .935  .948 (.015)  .943 (.013)  .945 
B2DPCA  .914 (.013)  .911 (.014)  .870  .908 (.015)  .908 (.013)  .865 
L2DPCA  .895 (.011)  .893 (.010)  .865  .884 (.012)  .884 (.013)  .860 
R2DPCA  .905 (.010)  .903* (.011)  .895  .916 (.016)  .914 (.016)  .895 
ViC Analysis
From the results in tables I and II, the difficulty of MORPHII is apparent. The highest accuracy achieved, 0.788, was 16% less than the highest for ORL (0.948). In the MORPHII experiments, performance increases substantially when cosine distance is used instead of euclidean. This is likely due to the fact that MORPHII suffers from high variability in illumination, whereas ORL does not. Cosine distance is a measure of similarity, not magnitude, so we see boosted accuracy on MORPHII, but only minor improvements for ORL.
In general, the 2DLDA algorithms outperform their 2DPCA counterparts. This is likely due to the fact that the eigenvectors from 2DLDA have more discriminative power for face recognition than those from 2DPCA.
In general the weighting scheme increases accuracy. However, in some cases the unweighted algorithm achieves better performance, and in other incidents the original (not random subspace) algorithm is the best. We can be confident that the random subspace method is in general effective, and that the weighting scheme will in most cases increase accuracy. More research needs to be done on parameter selection. One obvious direction for future work is in the selection of , the exponent to which all ARI are raised to determine the weighting scheme. It is obvious that one choice of does not generalize well to other algorithms (a value for that works well with L2DLDA may not generalize well to R2DPCA, for example). A more systematic and robust way of selecting (and other parameters) is needed to ensure increased performance regardless of algorithm or dataset.
The Nguyen et al. framework in [5] achieves satisfactory performance on ORL, but it performs quite poorly on MORPHII. Although the high accuracies achieved on ORL are not replicated on MORPHII, the contributions presented in this paper significantly increase accuracy. First, considering the cosine distance metric helped to account for the variable illumination of MORPHII. Using the eigenvectors from 2DLDA significantly increased recognition accuracy, and considering multiple projection schemes (bilateral, left, and right) showed that one scheme is not always better than the others. Finally, the weighting scheme proposed here further boosts accuracy.
Vii Conclusions
A novel algorithm for face recognition, RS2DLDA, is presented and evaluated on MORPHII and ORL datasets. It outperforms previously proposed RS2DPCA [5] by utilizing multiple distance metrics and projection schemes. RS2DLDA further benefits from a weighting scheme that increases accuracy. Future work will include investigation into the key differences of the bilateral, left, and right versions of 2DLDA and 2DPCA, and exploration into randomly sampling eigenvectors with replacement. More challenging face recognition problems will also be considered. Finally, an optimized weighting scheme will be sought out that is effective regardless of dataset difficulty or algorithm used.
Acknowledgment
This work was conducted at an NSFsponsored Research Experience for Undergraduates (REU) program at University of North Carolina Wilmington. I would like to thank Dr. Cuixian Chen, Dr. Yishi Wang, and Troy Kling for their dedication and support.
References
 [1] The database of faces. http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html. Accessed: 20170626.
 [2] Tin Kam Ho. The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(8):832–844, Aug 1998.
 [3] Lawrence Hubert and Phipps Arabie. Comparing partitions. Journal of Classification, 2(1):193–218, Dec 1985.
 [4] Hui Kong, Lei Wang, Eam Khwang Teoh, Xuchun Li, JianGang Wang, and Ronda Venkateswarlu. Generalized 2d principal component analysis for face image representation and recognition. Neural Networks, 18(5):585 – 594, 2005. IJCNN 2005.
 [5] Nam Nguyen, Wanquan Liu, and Svetha Venkatesh. Random Subspace TwoDimensional PCA for Face Recognition, pages 655–664. Springer Berlin Heidelberg, Berlin, Heidelberg, 2007.
 [6] S. Noushath, G. Hemantha Kumar, and P. Shivakumara. (2d)2lda: An efficient approach for face recognition. Pattern Recognition, 39(7):1396 – 1400, 2006.
 [7] R. Polikar. Ensemble based systems in decision making. IEEE Circuits and Systems Magazine, 6(3):21–45, Third 2006.
 [8] K. Ricanek and T. Tesafaye. Morph: a longitudinal image database of normal adult ageprogression. In 7th International Conference on Automatic Face and Gesture Recognition (FGR06), pages 341–345, April 2006.
 [9] Anbang Xu, Xin Jin, Yugang Jiang, and Ping Guo. Complete twodimensional pca for face recognition. In 18th International Conference on Pattern Recognition (ICPR’06), volume 3, pages 481–484, 2006.
 [10] Jian Yang, D. Zhang, A. F. Frangi, and Jing yu Yang. Twodimensional pca: a new approach to appearancebased face representation and recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(1):131–137, Jan 2004.
 [11] Daoqiang Zhang and ZhiHua Zhou. (2d)2pca: Twodirectional twodimensional pca for efficient face representation and recognition. Neurocomputing, 69(1):224 – 231, 2005. Neural Networks in Signal Processing.