Random Subspace Two-dimensional LDA for Face Recognition*

Random Subspace Two-dimensional LDA for Face Recognition*

Garrett Bingham *This work was supported by the National Science Foundation (NSF) under DMS Grant Number 1659288.G. Bingham is an undergraduate majoring in Computer Science & Mathematics at Yale University. garrett.bingham at yale.edu
Abstract

In this paper, a novel technique named random subspace two-dimensional LDA (RS-2DLDA) is developed for face recognition. This approach offers a number of improvements over the random subspace two-dimensional PCA (RS-2DPCA) framework introduced by Nguyen et al. [5]. Firstly, the eigenvectors from 2DLDA have more discriminative power than those from 2DPCA, resulting in higher accuracy for the RS-2DLDA method over RS-2DPCA. Various distance metrics are evaluated, and a weighting scheme is developed to further boost accuracy. A series of experiments on the MORPH-II and ORL datasets are conducted to demonstrate the effectiveness of this approach.

I Introduction

Face recognition has numerous applications in surveillance and authentication systems, yet it remains a difficult problem. Faces of different people can appear very similar, while images of one person are often quite different. Many approaches have been tested for face recognition. In recent years, two-dimensional variants of well-known feature extraction methods such as principal component analysis (PCA) and linear discriminant analysis (LDA) have received growing attention. They generally achieve higher accuracy and are computationally efficient because they require fewer coefficients for image representation. Encouraged by the work in [5], in which the random subspace method is applied to two-dimensional PCA, we propose a new algorithm and evaluate its performance on the MORPH-II and ORL datasets.

The remainder of this paper is organized as follows: Section 2 gives a summary of 2DPCA and its variants; Section 3 does the same for 2DLDA; in Section 4 we review the random subspace method and its previous application to 2DPCA; Section 5 is dedicated to our new approach, RS-2DLDA; experiments are conducted in Section 6; we conclude in Section 7.

Ii Two-Dimensional Principal Component Analysis

Ii-a Introduction

Principal component analysis (PCA) is a widely used feature extraction and dimension reduction technique. In PCA-based face recognition, two-dimensional image matrices must first be transformed into one-dimensional vectors. The vectorized images are usually of high dimension. This makes it difficult to calculate the covariance matrix accurately when there is a relatively small number of training samples. In [10], Yang et al. proposed two-dimensional PCA (2DPCA), which differs from PCA in that the image matrices are not transformed into vectors. Instead, an image covariance matrix is constructed from the original image matrices. The main advantage of 2DPCA over PCA is that the size of the image covariance matrix is much smaller. As a result, it is easier to evaluate it accurately, and it is less computationally expensive to determine the corresponding eigenvectors.

Ii-B Right 2DPCA

Right 2DPCA (R2DPCA) is simply 2DPCA as originally proposed by Yang et al. in [10]. We refer to it as Right 2DPCA to distinguish it from other generalizations of 2DPCA.

Ii-B1 Algorithm

Let be a collection of image matrices of dimension , where represents the th image matrix, . We wish to project each image matrix onto an -dimensional vector , resulting in an -dimensional projected feature vector .

(1)

We choose such that the scatter of all projected feature vectors is maximized. Equivalently, we seek that maximizes the trace of the covariance matrix of the projected feature vectors . Thus, we wish to maximize

(2)

where denotes the trace of .

(3)
(4)
(5)

therefore,

(6)

Define the matrix as

(7)

From its definition, we know that is an positive semi-definite matrix. It can be evaluated directly using the training image matrices. Let the average image of all training images be denoted by , so that

(8)

We can then approximate by where

(9)

Equation (2) can instead be expressed as

(10)

It has been shown that the vector which maximizes (10) is the eigenvector of corresponding to the largest eigenvalue. In general it is not enough to select just one vector for projection. Normally an orthonormal set of vectors are chosen. These are the eigenvectors of corresponding to the largest eigenvalues.

Ii-B2 Feature Extraction

The optimal projection vectors can be used for feature extraction. Let be an matrix. Then for a given image , let

(11)

The set of projected feature vectors are called the principal component vectors of the image . In 2DPCA the principal components are vectors, not scalars. The principal component vectors can be used to form an feature matrix .

Ii-B3 Classification

Given two arbitrary feature matrices and , the distance between them can be calculated using the Frobenius norm . Other norms can also be considered. Once all pairwise distances between feature matrices have been calculated, a -nearest neighbor (KNN) algorithm is used for classification.

Ii-B4 Image Reconstruction

Since are orthonormal, from (11) we can obtain a reconstruction of image :

(12)

is of the same dimension as . If , then . Otherwise, is an approximation for . See Fig. 2.

Ii-C Left 2DPCA

Hong et al. showed in [4] that 2DPCA is equivalent to PCA if each row of an image matrix is considered as a computational unit. A natural extension would then be to consider each column of an image matrix as a computational unit. This is called Left 2DPCA (L2DPCA) because the images are projected by a left matrix multiplication as opposed to a right matrix multiplication in conventional 2DPCA. The algorithm is formulated in [11]. It is important to consider both Right and Left 2DPCA because the rows of an image may contain vital discriminatory information that is lacking in the columns, and vice-versa. The algorithm for L2DPCA largely mimics that of R2DPCA, with a few small changes.

Ii-C1 Algorithm

Let be a collection of image matrices of dimension , where represents the th image matrix, . We wish to project each image matrix onto an -dimensional vector , resulting in an -dimensional projected feature vector .

(13)

We choose such that the scatter of all projected feature vectors is maximized. Equivalently, we seek that maximizes the trace of the covariance matrix of the projected feature vectors . Thus, we wish to maximize

(14)

where denotes the trace of .

(15)
(16)
(17)

therefore,

(18)

Define the matrix as

(19)

From its definition, we know that is an nonnegative definite matrix. It can be evaluated directly using the training image matrices. Let the average image of all training images be denoted by as defined in (8). We can then approximate by , where

(20)

Equation (14) can instead be expressed as

(21)

It has been shown that the vector which maximizes (21) is the eigenvector of corresponding to the largest eigenvalue. In general it is not enough to select just one vector for projection. Normally an orthonormal set of vectors are chosen. These are the eigenvectors of corresponding to the largest eigenvalues.

Ii-C2 Feature Extraction

The optimal projection vectors can be used for feature extraction. Let be an matrix. Then for a given image , let

(22)

The set of projected feature vectors are called the principal component vectors of the image . In 2DPCA the principal components are vectors, not scalars. The principal component vectors can be used to form a feature matrix .

Ii-C3 Classification

Classification here is equivalent to that in R2DPCA.

Ii-C4 Image Reconstruction

Fig. 1: Example ORL image reconstruction by R2DPCA.
Fig. 2: Example ORL image reconstruction by L2DPCA.

Since are orthonormal, from (22) we can obtain a reconstruction of image :

(23)

is of the same dimension as . If , then . Otherwise, is an approximation for . See Fig. 2.

Ii-D Bilateral 2DPCA

One major limitation of R2DPCA and L2DPCA is that they each only consider information from either the rows or the columns of an image, but not both. Another drawback is that they require many coefficients for image representation. In R2DPCA, an image can only be reduced to , whereas L2DPCA can only reduce the same image to . Bilateral 2DPCA (B2DPCA) as proposed in [4] addresses both of these problems. It incorporates both row and column information from the images, and is able to reduce an image to , making it more computationally efficient.

Ii-D1 Algorithm

Given feature matrix from L2DPCA and feature matrix from R2DPCA, project image by the following transformation:

(24)

where is of dimension . Similar to conventional 2DPCA, classification is done by KNN and a chosen distance metric. Experiments in [4] demonstrate the accuracy and efficiency of B2DPCA.

Iii Two-Dimensional Linear Discriminant Analysis

Iii-a Introduction

Conventional linear discriminant analysis (LDA) is a popular technique for feature extraction and dimension reduction. LDA seeks an optimal projection of the data so that variance between classes is maximized while the variance within classes is minimized. Because it is a supervised method, it often has advantages over PCA, especially in face recognition problems. When using LDA for face recognition, typically the two-dimensional image matrices must first be converted to one-dimensional vectors. This causes the between-class and within-class scatter matrices to be of high dimension, making them difficult to accurately calculate. Furthermore, LDA requires that at least one of the matrices be invertible. However, they are both high-dimensional, and in practice the number of samples is relatively small. This all but guarantees that both matrices will be singular. This is known as the small sample size (SSS) problem.

An extension of 2DPCA, two-dimensional LDA (2DLDA) as proposed in [9], avoids the SSS problem. The between-class and within-class scatter matrices are calculated directly from the original image matrices. Thus, their dimension is much smaller, making them easy to compute accurately. In practice one has enough data to guarantee that they are not singular, and it is more computationally efficient to find the desired eigenvectors.

Iii-B Right 2DLDA

As with R2DPCA, Right 2DLDA (R2DLDA) is equivalent to conventional 2DLDA. We refer to it as R2DLDA to distinguish it from other generalizations of 2DLDA.

Iii-B1 Algorithm

Let be a collection of image matrices of dimension , where represents the th image matrix, . Each image belongs to one of classes, where the th class has samples and . R2DLDA transforms all images by a set of discriminating vectors resulting in projected image matrices

(25)

is and its columns are chosen to maximize the 2D Fisher criterion

(26)

where and represent the between-class and within-class scatter matrices of Right 2DLDA, respectively. Let denote the average image of the th class, and the average image of all images. It follows that

(27)

and

(28)

It has been shown that the vectors which maximize (26) are the eigenvectors of corresponding to the largest eigenvalues.

Iii-B2 Feature Extraction

The optimal projection vectors can be used for feature extraction. Let be an matrix. Then for a given image , let

(29)

The set of projected feature vectors are called the Right Fisher feature vectors of the image . The Right Fisher feature vectors can be used to form an Fisher feature matrix . Classification is done with KNN and a chosen distance metric.

Iii-C Left 2DLDA

Similar to the analysis of 2DPCA in [4], it can be seen that R2DLDA operates on information contained in the rows of image matrices. There may be different discriminatory information contained in the columns, thus, a natural extension of R2DLDA is Left 2DLDA (L2DLDA). The framework for L2DLDA is given in [6].

Iii-C1 Algorithm

Let be a collection of image matrices of dimension , where represents the th image matrix, . Each image belongs to one of classes, where the th class has samples . L2DLDA transforms all images by a set of discriminating vectors resulting in projected image matrices

(30)

is and its columns are chosen to maximize the 2D Fisher criterion

(31)

where and represent the between-class and within-class scatter matrices of Left 2DLDA, respectively. Let denote the average image of the th class, and the average image of all images. It follows that

(32)

and

(33)

It has been shown that the vectors which maximize (31) are the eigenvectors of corresponding to the largest eigenvalues.

Iii-C2 Feature Extraction

The optimal projection vectors can be used for feature extraction. Let be an matrix. Then for a given image , let

(34)

The set of projected feature vectors are called the Left Fisher feature vectors of the image . The Left Fisher feature vectors can be used to form a Fisher feature matrix . Classification is done with KNN and a chosen distance metric.

Iii-D Bilateral 2DLDA

R2DLDA and L2DLDA suffer from the same limitations as R2DPCA and L2DPCA. Bilateral 2DLDA (B2DLDA) as proposed in [6] addresses these shortcomings. It incorporates both row and column information from the images, and is able to reduce an image to , making it more computationally efficient.

Iii-D1 Algorithm

Given feature matrix from L2DLDA and feature matrix from R2DLDA, project image by the following transformation:

(35)

where is of dimension . Similar to conventional 2DLDA, classification is done by KNN and a chosen distance metric. Experiments in [6] demonstrate the accuracy and efficiency of B2DLDA.

Iv Random Subspace Method

Iv-a Overview

In ensemble learning one attempts to train a set of diverse classifiers whose individual outputs are combined into one final decision. If the classifiers are diverse, that is, if they each make different mistakes, then the hope is that through a sensible combination of the classifiers’ decisions that those individual errors will be corrected. Many techniques for training diverse classifiers exist. Bootstrap aggregating (bagging) is a popular method. In bagging, each classifier is trained on a random subset of the training data in order to promote model variance. The random subspace method [2] is similar to bagging, but instead of training each model on a random subset of the training data, each model is trained on random samples of features instead of the entire feature set. This causes individual classifiers to not over-focus on features that appear highly predictive in the training set. It is generally used with decision trees, though it has been applied to other areas as well.

Fig. 3: Entropy experiment on a subset of MORPH-II with RS-2DLDA. Selecting 10 random eigenvectors increases entropy for both euclidean and cosine distances.
Fig. 4: Entropy experiment carried out on a subset of MORPH-II with RS-2DLDA. Train at least 50 random classifiers to increase diversity.

Iv-B Application to 2DPCA

In [5], Nguyen et al. proposed random subspace two-dimensional PCA (RS-2DPCA). To our knowledge, this is the only application of the random subspace method to any of the two-dimensional variants of PCA, LDA, etc. In their paper they note that the accuracy of 2DPCA depends heavily on , the number of eigenvectors kept. Choosing too low results in poor accuracy, while choosing large can easily cause overfitting to the training data. Generally, the eigenvectors corresponding to the largest eigenvalues are kept. However, the eigenvectors that are discarded still contain valuable information. To overcome these limitations, random samples of the eigenvectors are used to build many classifiers. As shown in [5], this results in improved and more stable accuracy, and makes it possible to utilize all of the eigenvectors without risk of overfitting. Unfortunately, the Nguyen et al. approach does not perform well on difficult datasets. In the next section, we propose our algorithm and demonstrate its improvements over RS-2DPCA.

V Random Subspace Two-Dimensional Linear Discriminant Analysis

V-a Introduction

Motivated by RS-2DPCA, we propose random subspace two-dimensional LDA (RS-2DLDA). Important advantages of RS-2DLDA over RS-2DPCA include:

  • The random subspace is a random sampling of eigenvectors from 2DLDA. These eigenvectors have been shown to have more discriminative power than those from 2DPCA.

  • Entropy measure is used to select parameters that result in diverse classifiers.

  • Each classifier’s reliability is estimated from an adjusted Rand index (ARI) based off of performance on the training data.

  • The ARI scores are used to develop a weighting scheme which is utilized in the final ensemble decision and further boosts accuracy.

Although we focus on applying the random subspace method to 2DLDA, it can be easily extended to 2DPCA, because it is merely a random sample of eigenvectors. We will give analysis of both to illustrate the superiority of 2DLDA over 2DPCA in face recognition. Note that our application of the random subspace method to 2DPCA is not equivalent to the approach by Nguyen et al. in [5]. They do not consider ways to measure classifier diversity nor does their model incorporate a weighting scheme.

V-B Increasing Diversity

The advantage of ensemble systems over single classifiers is that the combination of outputs from many classifiers can often correct for the errors of individual classifiers. However, this only works when the classifiers are diverse, that is, when each classifier makes different mistakes. If all classifiers are essentially the same, we cannot hope to correct their individual errors. There exist many ways to measure classifier diversity. One such method is entropy measure, which assumes diversity is highest when half of the classifiers are correct for a given test image. Define as the number of classifiers out of that misclassify the th image. Then entropy is defined as

(36)

where . Low values indicate similar classifiers, and high values indicate diverse classifiers. If we can choose parameters that yield a highly diverse set of classifiers, then it is more likely that we will be able to increase the final accuracy with an intelligent combination of the classifier outputs.

We can see in Fig. 4 that entropy varies with the number of random eigenvectors we select. If we don’t choose enough eigenvectors, then each classifier is not predictive enough and performs poorly. On the other hand, if we choose too many, the classifiers are all very similar. Choosing a moderate value (10 in this case) works well to increase classifier diversity.

Entropy is also affected by the number of random classifiers we train. Not training enough results in poor diversity. Training more classifiers will increase entropy, but to a point. Fig. 4 illustrates this trend.

V-C Estimating Classifier Credibility

In RS-2DLDA, each classifier is a random sampling of eigenvectors from 2DLDA. Together these eigenvectors form a matrix. All images are multiplied by this matrix, which projects them to a new space. It is impossible to know how well a classifier will perform on the testing data, but we can get an idea based on its performance on the training data. Since each classifier defines a projection to a new space, we can expect that the classifiers which project images of the same person close to each other but images of different people far apart will perform well on the testing data. On the other hand, if the projected images of different people are mixed together, and there are no clear boundaries separating the images of one person from the next, then we will expect the classifier to perform poorly.

For a given classifier, take a training image and find its predicted class using KNN on the remainder of the training set. Do this for all training images, obtaining a prediction for each training image. Let this set of predictions define a clustering of the training images. Let the ground truth values define another clustering of the training images. We can expect the classifier to perform well if is similar to . That is, we can use a clustering similarity measure to evaluate whether the projection defined by the random sample of eigenvectors appears to preserve or disregard class differences. The adjusted Rand index [3] is one such clustering similarity measure that fits this purpose well.

Fig. 5: Adjusted Rand index experiment performed on a subset of MORPH-II. On average the ARI are higher for cosine distance than for euclidean.

Given , a set of images and two clusterings of these images, namely , the ground truth identities of the images and , the predicted identities from a given classifier, the overlap between and can be summarized in a contingency table , where each entry is the number of images in common between and , .

Sums
Sums

The adjusted Rand index (ARI) is then calculated as

(37)

ARI ranges from to . High values indicate similar clusterings, while low values mean the clusterings are dissimilar. The adjusted Rand index is a corrected-for-chance version of the Rand index. It includes negative values which indicate a Rand index that is less than would be expected if the clusterings were drawn randomly.

If a given classifier achieves a high ARI, we know in its projected space that images of the same person are clustered close together and images of different people are spread apart. Thus, we expect these classifiers to outperform those with a low ARI when applied to the testing data. To take advantage of this, we develop a weighting scheme to give classifiers with a high ARI more influence in the final decision.

In Fig. 5 one can see the distribution of ARI for a set of 50 random classifiers operating on a subset of MORPH-II. The ARI are higher on average for cosine distance than for euclidean, suggesting cosine distance may be more suited to face recognition on MORPH-II than euclidean. The ARI are all positive, indicating that there is some similarity between the classifiers’ predictions on the training set and the ground truth values. Although the ARI are quite low (the highest is less than 0.3) it is important to remember that in ensemble learning we combine many weak classifiers to build one strong classifier. Indeed, we could have chosen to sample twice as many eigenvectors and easily increased all ARI, but this would come at the expense of classifier diversity.

V-D Weighted Majority Voting

Fig. 6: Changing the exponent to which each ARI is raised affects the performance of the weighting scheme.
Fig. 7: There is no value for that is optimal in every scenario.

In majority voting, each classifier gets one vote. The final decision is the class with the most votes, regardless of whether the percentage of votes is above . Define the decision of the th classifier as , where , and . is the number of classifiers and the number of classes. If the th classifier predicts the th person to belong to class , then , and otherwise . We choose class if

(38)

If we know that some classifiers are more accurate than others, then we can weight their decisions so that more credible classifiers have a higher influence on the final decision. This is known as weighted majority voting. Assign weight to the th classifier. We choose class if

(39)

We do not require the proportion of support to class to be over .

We expect the classifiers with a high ARI to be more accurate than those with a low ARI. But how much additional influence should we give to the strong classifiers? We need a monotone function which maps an ARI to a weight for each classifier. One simple solution to consider is raising each ARI to a common exponent . Choosing a low value for will mean the strong classifiers have marginally more influence than the weak classifiers. A high value for would make so that only the strongest classifiers make any real impact on the final decision. A moderate value for should give a proper balance and help to increase overall accuracy.

In Fig. 7 and Fig. 7 we train 50 random classifiers on a subset of MORPH-II and experiment by varying , the degree to which each ARI is raised. We can see that a weighing scheme has the potential to substantially boost performance. However, clearly the best value for is different in Fig. 7 than in Fig. 7, though they are identical experiments, up to a different initial random seed. More research needs to be done on how to select an optimal value for . Other monotone functions, such as the logistic function, could also be considered.

V-E Algorithm

Given the following parameters:

Param. Description
Set of all images, where denotes the th image
Set of training images
Set of testing images
Total number of images
Number of people (classes)
Within-class and between-class covariance matrices from L2DLDA
Within-class and between-class covariance matrices from R2DLDA
Eigenvectors of
Eigenvectors of
Number of eigenvectors of and kept
th projected image
Number of classifiers
Ground truth identities for all images
Predicted training identities, where is the predicted identity of the th image
Indicator: 1 if the th classifier predicts the th image to be from the th person, 0 otherwise
ARI Adjusted Rand index of the th classifier
Exponent to which each ARI is raised, resulting in each classifier’s weight
Weight given to the th classifier
Final testing ensemble prediction, where is the predicted identity of the th image

and functions:

Function Description
Returns the most common class among the nearest neighbors of in .
Returns the adjusted Rand index of and .

the algorithm for RS-2DLDA can be summarized as follows:

1:  for  to  do {Create classifiers}
2:      = random sample of columns of
3:      = random sample of columns of
4:     for  to  do {Project all images}
5:         
6:     end for
7:     for  in  do {Evaluate classifier on training set}
8:         
9:     end for
10:     
11:     
12:     for  in  do {Make predictions on testing set}
13:          for all
14:         
15:         
16:     end for
17:  end for
18:  for  in  do
19:     
20:  end for
21:  return  
Algorithm RS-2DLDA

To consider 2DPCA or a different projection scheme (bilateral, left, or right), simply replace the projection in step 5 of the algorithm. For example, to apply the random subspace method and weighting scheme to L2DPCA, step 5 would become , whereas for R2DLDA it would be .

Vi Experiments

Fig. 8: Example pre-processed MORPH-II images

Vi-a Introduction to the Data

Vi-A1 Morph-Ii

The MORPH-II dataset [8] is a longitudinal dataset collected over five years. It contains 55,134 images from 13,617 individuals. Subjects’ ages range from 16-77 years of age, and there are an average of four images per person. MORPH-II is a difficult dataset for face recognition because it suffers from high variability in pose, facial expression, and illumination. To account for this, all images were pre-processed. OpenCV was used to automatically detect the face and the eyes in each image. The images were rotated so that the eyes were horizontal, and then cropped to to reduce noise from the background or the subject’s hair. Finally, all images were histogram equalized with a built-in Python function to help account for the differences in illumination.

Vi-A2 Orl

The ORL dataset [1] contains 40 people each with 10 images of size . There are minor variations in lighting and facial expression, but it is an easy dataset for face recognition. No pre-processing was done on the images.

Vi-B Experiment Design

A subset of MORPH-II was used for the experiments. Among those with 10+ images, 50 arbitrary people were selected. Five images per person (250 images total) were randomly selected for training, and five for testing (250 images total). 50 different 5-nearest neighbor classifiers were created, each from a random sample of 10 eigenvectors. The classifiers’ predictions on the testing data were combined into one final decision by weighted majority voting. For ORL, the entire dataset was used. Five images per person (200 images total) were randomly selected for training, and five for testing (200 images total). 50 different 1-nearest neighbor classifiers were created, each from a random sample of 5 eigenvectors. The classifiers’ predictions on the testing data were combined into one final decision by weighted majority voting. For completeness, bilateral, right, and left projection schemes of 2DLDA and 2DPCA are considered. All experiments were repeated thirty times and results averaged to obtain the results in tables I and II. Standard error is shown in parentheses.

Face Recognition on MORPH-II
Algorithm Euclidean Cosine
Weighted Unweighted Original Weighted Unweighted Original
B2DLDA .727 (.019) .678 (.026) .764 .781 (.018) .786 (.015) .768
L2DLDA .743 (.008) .735 (.009) .756 .788 (.010) .780 (.012) .776
R2DLDA .704 (.016) .662 (.018) .704 .723 (.018) .733 (.016) .704
B2DPCA .706 (.013) .701 (.013) .564 .702 (.018) .692 (.013) .556
L2DPCA .678 (.009) .667 (.011) .552 .670 (.018) .660 (.016) .544
R2DPCA .611 (.010) .609* (.007) .580 .609 (.009) .612 (.009) .584
TABLE I: Experiments conducted on MORPH-II. Standard error is shown in parentheses, and top accuracy in bold. The framework introduced by Nguyen et al. in [5] is denoted (*).
Face Recognition on ORL
Algorithm Euclidean Cosine
Weighted Unweighted Original Weighted Unweighted Original
B2DLDA .931 (.017) .924 (.017) .935 .939 (.015) .936 (.016) .940
L2DLDA .914 (.013) .909 (.014) .940 .937 (.016) .935 (.017) .940
R2DLDA .929 (.013) .923 (.016) .935 .948 (.015) .943 (.013) .945
B2DPCA .914 (.013) .911 (.014) .870 .908 (.015) .908 (.013) .865
L2DPCA .895 (.011) .893 (.010) .865 .884 (.012) .884 (.013) .860
R2DPCA .905 (.010) .903* (.011) .895 .916 (.016) .914 (.016) .895
TABLE II: Experiments conducted on ORL. Standard error is shown in parentheses, and top accuracy in bold. The framework introduced by Nguyen et al. in [5] is denoted (*).

Vi-C Analysis

From the results in tables I and II, the difficulty of MORPH-II is apparent. The highest accuracy achieved, 0.788, was 16% less than the highest for ORL (0.948). In the MORPH-II experiments, performance increases substantially when cosine distance is used instead of euclidean. This is likely due to the fact that MORPH-II suffers from high variability in illumination, whereas ORL does not. Cosine distance is a measure of similarity, not magnitude, so we see boosted accuracy on MORPH-II, but only minor improvements for ORL.

In general, the 2DLDA algorithms outperform their 2DPCA counterparts. This is likely due to the fact that the eigenvectors from 2DLDA have more discriminative power for face recognition than those from 2DPCA.

In general the weighting scheme increases accuracy. However, in some cases the unweighted algorithm achieves better performance, and in other incidents the original (not random subspace) algorithm is the best. We can be confident that the random subspace method is in general effective, and that the weighting scheme will in most cases increase accuracy. More research needs to be done on parameter selection. One obvious direction for future work is in the selection of , the exponent to which all ARI are raised to determine the weighting scheme. It is obvious that one choice of does not generalize well to other algorithms (a value for that works well with L2DLDA may not generalize well to R2DPCA, for example). A more systematic and robust way of selecting (and other parameters) is needed to ensure increased performance regardless of algorithm or dataset.

The Nguyen et al. framework in [5] achieves satisfactory performance on ORL, but it performs quite poorly on MORPH-II. Although the high accuracies achieved on ORL are not replicated on MORPH-II, the contributions presented in this paper significantly increase accuracy. First, considering the cosine distance metric helped to account for the variable illumination of MORPH-II. Using the eigenvectors from 2DLDA significantly increased recognition accuracy, and considering multiple projection schemes (bilateral, left, and right) showed that one scheme is not always better than the others. Finally, the weighting scheme proposed here further boosts accuracy.

Vii Conclusions

A novel algorithm for face recognition, RS-2DLDA, is presented and evaluated on MORPH-II and ORL datasets. It outperforms previously proposed RS-2DPCA [5] by utilizing multiple distance metrics and projection schemes. RS-2DLDA further benefits from a weighting scheme that increases accuracy. Future work will include investigation into the key differences of the bilateral, left, and right versions of 2DLDA and 2DPCA, and exploration into randomly sampling eigenvectors with replacement. More challenging face recognition problems will also be considered. Finally, an optimized weighting scheme will be sought out that is effective regardless of dataset difficulty or algorithm used.

Acknowledgment

This work was conducted at an NSF-sponsored Research Experience for Undergraduates (REU) program at University of North Carolina Wilmington. I would like to thank Dr. Cuixian Chen, Dr. Yishi Wang, and Troy Kling for their dedication and support.

References

  • [1] The database of faces. http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html. Accessed: 2017-06-26.
  • [2] Tin Kam Ho. The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(8):832–844, Aug 1998.
  • [3] Lawrence Hubert and Phipps Arabie. Comparing partitions. Journal of Classification, 2(1):193–218, Dec 1985.
  • [4] Hui Kong, Lei Wang, Eam Khwang Teoh, Xuchun Li, Jian-Gang Wang, and Ronda Venkateswarlu. Generalized 2d principal component analysis for face image representation and recognition. Neural Networks, 18(5):585 – 594, 2005. IJCNN 2005.
  • [5] Nam Nguyen, Wanquan Liu, and Svetha Venkatesh. Random Subspace Two-Dimensional PCA for Face Recognition, pages 655–664. Springer Berlin Heidelberg, Berlin, Heidelberg, 2007.
  • [6] S. Noushath, G. Hemantha Kumar, and P. Shivakumara. (2d)2lda: An efficient approach for face recognition. Pattern Recognition, 39(7):1396 – 1400, 2006.
  • [7] R. Polikar. Ensemble based systems in decision making. IEEE Circuits and Systems Magazine, 6(3):21–45, Third 2006.
  • [8] K. Ricanek and T. Tesafaye. Morph: a longitudinal image database of normal adult age-progression. In 7th International Conference on Automatic Face and Gesture Recognition (FGR06), pages 341–345, April 2006.
  • [9] Anbang Xu, Xin Jin, Yugang Jiang, and Ping Guo. Complete two-dimensional pca for face recognition. In 18th International Conference on Pattern Recognition (ICPR’06), volume 3, pages 481–484, 2006.
  • [10] Jian Yang, D. Zhang, A. F. Frangi, and Jing yu Yang. Two-dimensional pca: a new approach to appearance-based face representation and recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(1):131–137, Jan 2004.
  • [11] Daoqiang Zhang and Zhi-Hua Zhou. (2d)2pca: Two-directional two-dimensional pca for efficient face representation and recognition. Neurocomputing, 69(1):224 – 231, 2005. Neural Networks in Signal Processing.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
128
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description