Face Verification Using Boosted Cross-Image Features
This paper proposes a new approach for face verification, where a pair of images needs to be classified as belonging to the same person or not. This problem is relatively new and not well-explored in the literature. Current methods mostly adopt techniques borrowed from face recognition, and process each of the images in the pair independently, which is counter intuitive. In contrast, we propose to extract cross-image features, i.e. features across the pair of images, which, as we demonstrate, is more discriminative to the similarity and the dissimilarity of faces. Our features are derived from the popular Haar-like features, however, extended to handle the face verification problem instead of face detection. We collect a large bank of cross-image features using filters of different sizes, locations, and orientations. Consequently, we use AdaBoost to select and weight the most discriminative features. We carried out extensive experiments on the proposed ideas using three standard face verification datasets, and obtained promising results outperforming state-of-the-art.
Facial image analysis is a widely investigated area in computer vision and multimedia, with several face-relevant problems explored in the literature. In face detection [16, 9, 18], an image is classified as a face or not by capturing the facial features and landmarks which distinguishes a face from other types of objects. On the other hand, in face recognition [19, 21, 23, 2], given an image of a face, possibly detected using a face detector, the person’s identity is recognized by capturing facial features which distinguish individuals.
Lately, a new face-relevant problem has been receiving increasing attention, namely, face verification. Given a pair of face images, the task is to verify whether they belong to the same person or not. Face verification is still poorly addressed with only few reported results. In , Guillaumin et al. employed a metric learning method to obtain the optimal Mahalanobis metrics over a given representation space of faces. On the other hand, in , Schroff et al. proposed a new similarity measure which increased the verification robustness against pose, illumination, and expression differences. Also in , a SVM-based method was proposed for video face verification, where a SVM is trained on a video of a person, then used to classify another video in order to determine if it corresponds to the same identity. Moreover, Kumar et al.  successfully employed attribute classifiers for face verification which significantly improved the results of face verification. Also Nguyen et al.  used an adaptive learning method to obtain the optimal transformation for the cosine similarity metric in the face space. Additionally, the Wavelet LBP features were proposed in  and employed for face verification. It can be clearly observed that current face verification methods mostly employ techniques borrowed from face recognition and apply them separately on each image of the pair to be verified.
In this paper, we propose a novel method for face verification, which we believe will set a new direction in this problem. Our method is inspired by the following observation: Since face verification is a two-image problem, using features extracted individually from each image is counter intuitive; rather, the most discriminative features should be extracted across the pair of images jointly. Methods such as  and  employ LBP and SIFT features which are designed to describe a certain region in a single image. In contrast, we propose to use cross-image features which describe the relation between two faces, thus better fits the face verification problem. Note that our observation can be generalized to verification of any type of objects. For instance, given two images of animals, we can verify whether both images are for cats by running each of them separately through a cat detector trained on images of cats vs non-cats, then combine the detector’s confidences. This is inherently different from using a verification classifier which receives a pair of images and classifies them jointly as being both cats or not. The latter classifier would be trained on cross-image features extracted from cat-cat pairs vs cat-noncat pairs, which is more discriminative for object similarity and dissimilarity.
Our cross-image features are embarrassingly simple, we use a large bank of normalized correlation filters between patches across the pair of images at different sizes, locations, and orientations. Additionally, we use Haar-like features similar to the ones used in , however, obtained across the pair of images. Consequently, we use AdaBoost to weight and select the most discriminative filters. Our method is derived from Viola and Jones face detector ; however, instead of using filters within an individual image, we use across-image filters such that we capture the relation between the pair of images, rather than capturing individual image features. It is important to note that cross-image features are extracted from two images in order to encode their similarity. This is inherently different from relating pairs or triplets of features extracted from the same image, which is typically used to encode the spatial relations between the features as in for example  and . The rest of the paper is organized as follows: In the next section, we present our proposed cross-image features, followed by AdaBoost classifier training. The experiments and the results are described in Section . Finally, Section concludes the paper.
2 Proposed Method
Our cross-image features for face verification are based on the simple rectangle filters presented by Viola and Jones . However, we extend the features to operate across pairs of images rather than within individual images. While these features seem simple, the experiments demonstrate their superior discriminative capabilities in face verification. Figure 1 illustrates the difference between the features of  and ours. We capture the differences between the two images, instead of the variation within the image. In particular, given a pair of face images and , which we aim to classify as belonging to the same identity or not, let denote a box positioned at location , with width and height , and orientation . We define two types of filters:
Haar-like cross-image filters: This type of filter compares the box rectangular sum between the image pair
NCC cross-image filters: This type of filter computes the normalized cross correlation (NCC) of the rectangular box between the image pair
We use a single version of this feature where the NCC is obtained between corresponding patches in the pair at the same location (i.e. both black and white boxes are placed at the same spatial coordinates in the pair). Note that since the correlation is normalized, these features are robust against illumination changes.
We quantize all possible positions, sizes, and orientations for each of the filters and obtain about features for each pair of images. Calculating the cross-image features for thousands of image boxes is time consuming. Therefore, in order to rapidly compute the features, we adopt the integral image method  and apply it on the cross-image features. The integral image for an image is defined as
Each box summation in can be obtained using four anchors from similar to . For the Haar-like cross-image filters, we first obtain the integral image for each of the images in the pair. Consequently, every box summation is obtained from its corresponding integral image. On the other hand, in order to use the integral image for efficient computation of the NCC cross-image features, we first expand equation 2 and apply a few simple operations to reformulate it as
where the summation is over all the pixels in the box filter, and is the number of pixels. Consequently, we obtain five integral images corresponding to each of the terms, in particular, . Using these internal images, each of the summation terms in equation 4 is efficiently computed using four anchors from the corresponding integral image.
In the training process, we use AdaBoost to select a subset of features and construct the classifier. In each round, the learning algorithm chooses from a heterogenous set of filters, including the Haar-like filters and the NCC filters. The AdaBoost algorithm also picks the optimal threshold for each feature. The output of AdaBoost is a classifier which consists of a linear combination of the selected features. For details on AdaBoost, the reader is referred to .
We extensively experimented on the proposed ideas using three standard datasets: Extended Yale B , CMU PIE , and Labeled Face in the Wild (LFW) . Figure 2 shows the the highest weighted rectangle features obtained after boosting of the cross-image features for theses three datasets. Figure 3 shows example verification results from the three datasets. In all experiments, we report our performance using the accuracy at EER (-ERR) similar to , where EER is the equal error rate, which is the average value of the false accept rate (FAR) and the false reject rate (FRR).
Extended Yale B: This is a standard face database consisting of frontal-face images of individuals. The face images are normalized to the size of . These face images are captured under different laboratory-controlled lighting conditions . There are about images for each individual. We follow a standard experimental setup similar to  and randomly select half of the images of each subject for training and the other half for testing. Figure 4 top left shows the obtained ROC curve for Extended Yale B dataset. We compared our method with state-of-the-art methods in table 1 and figure 4. It is clear that our cross-image features outperforms the other approaches.
|Method||Accuracy at EER (%)|
|YaleB||PIE Illum||PIE Light|
|Heusch et al. ||73.64||84.85||89.63|
|Zhang et al. ||85.09||79.40||82.77|
CMU PIE: This dataset contains subjects with different poses and different expressions under different lighting conditions. We follow the experiment setup in  and use two subsets from this database, namely, “illumination” (without ambient light) and “lighting” (with ambient light). Similar to , in the illumination subset we use images per person for training, and another images with different illumination conditions are randomly selected for testing. In the lighting subset, we use 5 images for training and 10 for testing. Table 1 shows the performance of our method compared to several other approaches. Additionally, figure 4 bottom left and right shows the obtained ROC curve for CMU PIE lighting (left) and illumination (right).
Labelled Faces in the Wild: LFW dataset contains face images collected from the web. The database includes individuals, of them have two or more distinct photos. This dataset is very challenging due to the variation in illumination and pose; therefore, it is useful for comparing the effectiveness of different low-level feature descriptors . In our experiments, we use the aligned version of LFW, and follow a standard -fold cross validation suggested by the authors of . Figure 4 top right shows the obtained ROC curve for LFW dataset. Moreover, table 2 compares our cross-image features with the state-of-the-art feature descriptors. It is clear that our method is robust to the challenging factors in LFW, and therefore outperforms other features.
We proposed a new robust features for face verification based on cross-image similarity. Our approach extracts simple rectangle features from the pair of images jointly, thus capturing discriminative properties of the pair. Through experiments, we demonstrated the power of our proposed approach on challenging datasets. In the future, we will explore the extension of our cross-image features for general object verification problem.
-  S. Baker and M. Bsat. The cmu pose, illumination, and expression (pie) database of human faces. the robotics institute, cmu.
-  B. Chen, Y. Kuo, Y. Chen, K. Chu, and W. Hsu. Semi-supervised face image retrieval using sparse coding with identity constraint. ACM Multimedia, 2011.
-  M. Everingham, J. Sivic, and A. Zisserman. “Hello! My name is… Buffy” – automatic naming of characters in TV video. In BMVC, 2006.
-  A. Georghiades, P. Belhumeur, and D. Kriegman. From few to many: Illumination cone models for face recognition under variable lighting and pose. TPAMI, 2001.
-  Y. Goh, A. Teoh, and M. Goh. Wavelet local binary patterns fusion as illuminated facial image preprocessing for face verification. Expert Systems with Applications, 2011.
-  S. Gu, Y. Zheng, and C. Tomasi. Critical nets and beta-stable features for image matching. ECCV, 2010.
-  M. Guillaumin, J. Verbeek, and C. Schmid. Is that you? metric learning approaches for face identification. In ICCV, 2010.
-  G. Heusch, Y. Rodriguez, and S. Marcel. Local binary patterns as an image preprocessing for face authentication. In International Conference on Automatic Face and Gesture Recognition (FGR), 2006.
-  R. Hsu, M. Abdel-Mottaleb, and A. Jain. Face detection in color images. TPAMI, 2002.
-  G. Huang, M. Ramesh, T. Berg, and E. Learned-Miller. Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Technical report, University of Massachusetts, Amherst, 2007.
-  N. Kumar, A. Berg, P. Belhumeur, and S. Nayar. Attribute and simile classifiers for face verification. In ICCV, 2009.
-  K. Lee, J. Ho, and D. Kriegman. Acquiring linear subspaces for face recognition under variable lighting. TPAMI, 2005.
-  H. Nguyen and L. Bai. Cosine similarity metric learning for face verification. ACCV, 2011.
-  B. Ni, S. Yan, and A. Kassim. Contextualizing histogram. CVPR, 2009.
-  N. Pinto, J. DiCarlo, and D. D. Cox. How far can you get with a modern face recognition test set using only simple features? CVPR, 2009.
-  H. Rowley, S. Baluja, and T. Kanade. Neural network-based face detection. TPAMI, 1998.
-  F. Schroff, T. Treibitz, D. Kriegman, and S. Belongie. Pose, illumination and expression invariant pairwise face-similarity measure via doppelgänger list comparison. In ICCV, 2011.
-  W. Tsao, A. Lee, Y. Liu, T. Chang, and H. Lin. A data mining approach to face detection. Pattern Recognition, 2010.
-  M. Turk and A. Pentland. Face recognition using eigenfaces. In CVPR, 1991.
-  P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. In CVPR, 2001.
-  L. Wiskott, J. Fellou, N. Kuiger, and C. von der Malsburg. Face recognition by elastic bunch graph matching. TPAMI, 1997.
-  L. Wolf, T. Hassner, and I. Maoz. Face recognition in unconstrained videos with matched background similarity. In CVPR, 2011.
-  J. Wright, A. Yang, A. Ganesh, S. Sastry, and Y. Ma. Robust face recognition via sparse representation. TPAMI, 2009.
-  T. Zhang, B. Fang, Y. Yuan, Y. Y. Tang, Z. Shang, D. Li, and F. Lang. Multiscale facial structure representation for face recognition under varying illumination. Pattern Recognition, 2009.