Matching Image Sets via Adaptive Multi Convex Hull
Abstract
Traditional nearest points methods use all the samples in an image set to construct a single convex or affine hull model for classification. However, strong artificial features and noisy data may be generated from combinations of training samples when significant intraclass variations and/or noise occur in the image set. Existing multimodel approaches extract local models by clustering each image set individually only once, with fixed clusters used for matching with various image sets. This may not be optimal for discrimination, as undesirable environmental conditions (eg. illumination and pose variations) may result in the two closest clusters representing different characteristics of an object (eg. frontal face being compared to nonfrontal face). To address the above problem, we propose a novel approach to enhance nearest points based methods by integrating affine/convex hull classification with an adapted multimodel approach. We first extract multiple local convex hulls from a query image set via maximum margin clustering to diminish the artificial variations and constrain the noise in local convex hulls. We then propose adaptive reference clustering (ARC) to constrain the clustering of each gallery image set by forcing the clusters to have resemblance to the clusters in the query image set. By applying ARC, noisy clusters in the query set can be discarded. Experiments on Honda, MoBo and ETH80 datasets show that the proposed method outperforms single model approaches and other recent techniques, such as Sparse Approximated Nearest Points, Mutual Subspace Method and Manifold Discriminant Analysis.
1 Introduction
Compared to single image matching techniques, image set matching approaches exploit set information for improving discrimination accuracy as well as robustness to image variations, such as pose, illumination and misalignment [1, 5, 11, 20]. Image set classification techniques can be categorised into two general classes: parametric and nonparametric methods. The former represent image sets with parametric distributions [1, 4, 12]. The distance between two sets can be measured by the similarity between the estimated parameters of the distributions. However, the estimated parameters might be dissimilar if the training and test data sets of the same subject have weak statistical correlations [11, 21].
Stateoftheart nonparametric methods can be categorised into two groups: singlemodel and multimodel methods. Singlemodel methods attempt to represent sets as linear subspaces [11, 23], or affine/convex hulls [5, 10]. For single linear subspace methods, principal angles are generally used to measure the difference between two subspaces [16, 23]. As the similarity of data structures is used for comparing sets, the subspace approach can be robust to noise and relatively small number of samples [21, 23]. However, single linear subspace methods consider the structure of all data samples without selecting optimal subsets for classification.
Convex hull approaches use geometric distances (eg. Euclidean distance between closest points) to compare sets. Given two sets, the closest points between two convex hulls are calculated by least squares optimisation. As such, these methods adaptively choose optimal samples to obtain the distance between sets, allowing for a degree of intraclass variations [10]. However, as the closest points between two convex hulls are artificially generated from linear combinations of certain samples, deterioration in discrimination performance can occur if the nearest points between two hulls are outliers or noisy. An example is shown in Fig. 1, where unrealistic artificial variations are generated from combinations of two distant samples.
Recent research suggest that creating multiple local linear models by clustering can considerably improve recognition accuracy [9, 20, 21]. In [8, 9], Locally Linear Embedding [15] and means clustering are used to extract several representative exemplars. Manifold Discriminant Analysis [20] and ManifoldManifold Distance [21] use the notion of maximal linear patches to extract local linear models. For two sets with and local models, the minimum distance between their local models determines the settoset distance, which is acquired by local model comparisons (ie. an exhaustive search).
A limitation of current multimodel approaches is that each set is clustered individually only once, resulting in fixed clusters of each set being used for classification. These clusters may not be optimal for discrimination and may result in the two closest clusters representing two separate characteristics of an object. For example, let us assume we have two face image sets of the same person, representing two conditions. The clusters in the first set represent various poses, while the clusters in the second set represent varying illumination (where the illumination is different to the illumination present in the first set). As the two sets of clusters capture two specific variations, matching image sets based on fixed cluster matching may result in a nonfrontal face (eg. rotated or tilted) being compared against a frontal face.
Contributions. To address the above problems, we propose an adaptive multi convex hull classification approach to find a balance between single hull and nearest neighbour method. The proposed approach integrates affine/convex hull classification with an adapted multimodel approach. We show that Maximum Margin Clustering (MMC) can be applied to extract multiple local convex hulls that are distant from each other. The optimal number of clusters is determined by restricting the average minimal middlepoint distance to control the region of unrealistic artificial variations. The adaptive reference clustering approach is proposed to enforce the clusters of an gallery image set to have resemblance to the reference clusters of the query image set.
Consider two sets and to be compared. The proposed approach first uses MMC to extract local convex hulls from to diminish unrealistic artificial variations and to constrain the noise in local convex hulls. We prove that after local convex hulls extraction, the noisy set will be reduced. The local convex hulls extracted from are treated as reference clusters to constrain the clustering of . Adaptive reference clustering is proposed to force the clusters of to have resemblance to the reference clusters in , by adaptively selecting the closest subset of images. The distance of the closest cluster from to the reference cluster of is taken to indicate the distance between the two sets. Fig. 2 shows a conceptual illustration of the proposed approach.
Comparisons on three benchmark datasets for face and object classification show that the proposed method consistently outperforms single hull approaches and several other recent techniques. To our knowledge, this is the first method that adaptively clusters an image set based on the reference clusters from another image set.
We continue the paper as follows. In Section 2 and 3, we briefly overview affine and convex hull classification and maximum margin clustering techniques. We then describe the proposed approach in Section 4, followed by complexity analysis in Section 5 and empirical evaluations and comparisons with other methods in Section 6. The conclusion and future research directions are summarised in Section 7.
2 Affine and Convex Hull Classification
An image set can be represented with a convex model, either an affine hull or a convex hull, and then the similarity measure between two sets can be defined as the distance between two hulls [5]. This can be considered as an enhancement of nearest neighbour classifier with an attempt to reduce sensitivity of withinclass variations by artificially generating samples within the set. Given an image set , where each is a feature vector extracted from an image, the smallest affine subspace containing all the samples can be constructed as an affine hull model:
(1) 
Any affine combinations of the samples are included in this affine hull.
If the affine hull assumption is too lose to achieve good discrimination, a tighter approximation can be achieved by setting constraints on to construct a convex hull via [5]:
(2) 
The distance between two affine or convex hulls and are the smallest distance between any point in and any point in :
(3) 
In [10], sparsity constraints are embedded into the distance when matching two affine hulls to enforce that the nearest points can be sparsely approximated by the combination of a few sample images.
3 Maximum Margin Clustering
Maximum Margin Clustering (MMC) extends Support Vector Machines (SVMs) from supervised learning to the more challenging task of unsupervised learning. By formulating convex relaxations of the training criterion, MMC simultaneously learns the maximum margin hyperplane and cluster labels [22] by running an SVM implicitly. Experiments show that MMC generally outperforms conventional clustering methods.
Given a point set , MMC attempts to find the optimal hyperplane and optimal labelling for two clusters simultaneously, as follows:
(4)  
where is the label learned for point . Several approaches has been proposed to solve this challenging nonconvex integer optimisation problem. In [19, 22], several relaxations and class balance constraints are made to convert the original MMC problem into a semidefinite programming problem to obtain the solution. Alternating optimisation techniques and cutting plane algorithms are applied on MMC individually in [24] and [25] to speed up the learning for large scale datasets. MMC can be further extended for multiclass clustering from multiclass SVM [26]:
(5)  
4 Proposed Adaptive Multi Convex Hull
The proposed adaptive multi convex hull classification algorithm consists of two main steps: extraction of local convex hulls and comparison of local convex hulls, which are elucidated in the following text.
4.1 Local Convex Hulls Extraction (LCHE)
Existing single convex hull based methods assume that any convex combinations of samples represent intraclass variations and thus should also be included in the hull. However, as the nearest points between two hulls are normally artificially generated from samples, they may be noisy or outliers that lead to poor discrimination performance. On the contrary, nearest neighbour method only compare samples in image sets disregarding their combinations, resulting in sensitivity to within class variations. There should be a balance between these two approaches.
We observe that under the assumption of affine (or convex) hull model, when sample points are distant from each other, linear combinations of samples will generate nonrealistic artificial variations, as shown in Figure 1. This will result in deterioration of classification.
Given an image set with a noisy sample , where are normal sample images and is a noisy sample, a single convex hull can be constructed using all the samples. The clear set of is defined as a set of points whose synthesis does not require the noisy data . That is,
(6) 
Accordingly, the remaining set of points in not in is the noisy set . The synthesis of points in noisy set must require the noisy data . That is,
(7)  
All of the normal samples that involve the synthesis of points in are called the noisy sample neighbours, as they are generally in the neighbourhood of the noisy sample. The noisy set defines the set of points that is affected by the noisy data . If the nearest point between and other convex hulls lie in the noisy set, the convex hull distance is inevitably affected by .
By dividing the single hull into multiple local convex hulls, , the noisy set is constrained to only one local convex hull. Assume the noisy sample is in one of the local convex hulls , then the new noisy set is defined as:
(8)  
Comparing (8) and (7), we notice that
(9) 
Unless all the noisy sample neighbours are in the same local convex hull as the noisy sample, the noisy set will be reduced. By controlling the number of clusters to divide the noisy sample neighbours, the noise level can be controlled. Figure 3 illustrates the effect of local convex hulls extraction on noisy set reduction.
It is therefore necessary to divide samples from a set into multiple subsets to extract multiple local convex hulls, such that samples in each subset are similar with minimal artificial features generated. Moreover, subsets should be far from each other. By dividing a single convex hull into multiple local convex hulls, unrealistic artificial variations can be diminished and noise can be constrained to local hulls.
A direct solution is to apply means clustering to extract local convex hulls [18]. However, the local convex hulls extracted by means clustering are generally very close to each other, without maximisation of distance between local convex hulls. We propose to use MMC clustering to solve this problem. For simplicity, we first investigate the problem for two local convex hulls. Given an image set , images should be grouped into two clusters and . Two local convex hulls and can be constructed from images in the two clusters individually. The two clusters should be maximally separated that any point inside the local convex hull is far from any point in the other convex hull. It is equivalent to maximise the distance between convex hulls:
(10) 
The solution of Eqn. (10) is equivalent to Eqn. (4). Because finding the nearest points between two convex hulls is a dual problem and has been proved to be equivalent to the SVM optimisation problem [2]. Thus,
(11)  
If we combine all of the images to make a set , then (11) is equivalent to:
=  (12)  
Maximisation of distance between clusters is the same as maximising the discrimination margin in Eqn. (12) and is proved to be equivalent to Eqn. (4) in [22]. We thus employ the maximum margin clustering method to cluster the sample images in a set to extract two distant local convex hulls. Similarly, multiclass MMC can be used to extract multiple local convex hulls as in Eqn. (5).
4.2 Local Convex Hulls Comparison (LCHC)
In this section, we describe two approaches to compare the local convex hulls: Complete Cluster Pairs Comparison and Adaptive Reference Clustering.
4.2.1 Complete Cluster Pairs (CCP) Comparison
Similar to other multimodel approaches, local convex hulls extracted from two image sets can be used for matching by complete cluster pairs (CCP) comparisons. Assuming multiple convex hulls are extracted from image set , and local convex hulls are extracted from image set . The distance between two sets and is defined as the minimal distance between all possible local convex hull pairs:
(13) 
Although LCHE can suppress noises in local convex hulls, CCP will still inevitably match noisy data between image sets. Another drawback of this approach is that fixed clusters are extracted from each image set individually for classification. There is no guarantee that the clusters from different sets capture similar variations. Moreover, this complete comparison requires convex hull comparisons, which is computational expensive.
4.2.2 Adaptive Reference Clustering (ARC)
In order to address the problems mentioned above, we propose the adaptive reference clustering (ARC) to adaptively cluster each gallery image set according to the reference clusters from the probe image set (shown in Fig. 4). Assuming image set is clustered to extract local convex hulls . For all images from image set , we cluster these images according to their distance to the reference local convex hulls from set . That is, each image is clustered to the closest reference local convex hull :
(14) 
After clustering all the images from set , maximally clusters are obtained and local convex hulls can then be constructed from these clusters. Since each cluster is clustered according to the corresponding reference cluster , we only need to compare the corresponding cluster pairs instead of the complete comparisons. That is
(15) 
ARC is helpful to remove noisy data matching (as shown in Fig. 4). If there is a noisy cluster in , when no such noise exists in , then all the images are likely to be far from . Therefore, no images in is assigned for matching with the noisy cluster.
4.3 Adjusting Number of Clusters
One important problem for local convex hull extraction is to determine the number of clusters. This is because the convex combination region (\ie regions of points generated from convex combination of sample points) will be reduced as the number of clusters is increased (as shown in Figure 3). The reduction may improve the system performance if the convex combination region contains many nonrealistic artificial variations. However, the reduction will negatively impact the system performance when the regions are too small that some reasonable combinations of sample points may be discarded as well. An extreme case is that each sample point is considered as a local convex hull, such as nearest neighbour method. We thus devise an approach, denoted as average minimal middlepoint distance (AMMD), to indirectly “measure” the region of nonrealistic artificial variations included in the convex hull.
Let be a point generated from convex combination of two sample points , in the set , . The minimum distance between to all the sample points in the set indicates the probability of not being unrealistic artificial variations. One extreme condition is when , then , which means is equivalent to a real sample. The further the distance, the higher the “artificialness” of point . In addition, the distance between and , must also be maximised in order to avoid measurement bias. Thus, a setting of (\ie as the middle point) is applied. Having this fact at our disposal, we are now ready to describe the AMMD.
For each sample point in the set , we find which is its furthest sample point in . The minimal middlepoint distance of the sample point is defined via:
(16) 
where is defined as . Finally, the AMMD of the set is computed via:
(17) 
where is the number of sample points included in .
By setting a threshold to constrain AMMD, the region of nonrealistic artificial variations can be controlled. An image set can be recursively divided into small clusters until the AMMD of all clusters is less than the threshold.
5 Complexity Analysis
Given two convex hulls with and vertexes, the basic GilbertJohnsonKeerthi (GJK) Distance Algorithm finds the nearest points between them with complexity [3]. The proposed approach needs a preprocessing step to cluster two image sets by MMC with a complexity of and [26], where is the sparsity of the data. Assuming each image set is clustered evenly with and clusters, to compare two local convex hulls, the complexity is . For the complete cluster pairs comparison, the total run time would be . The dominant term is on complete local convex hulls comparison, ie. .
By applying the adaptive reference clustering technique, one of the image set needs to be clustered according to the reference clusters from the other set with a complexity of . Thus the total run time is . The dominant term is on the adaptive reference clustering, ie. .
6 Experiments
We first compare the design choices offered by the proposed framework and then compare the best variant of the framework with other stateoftheart methods.
The framework has two components: Local Convex Hulls Extraction (LCHE) and Local Convex Hulls Comparison (LCHC). There are two suboptions for LCHE: (1) Maximum Margin Clustering (MMC) versus means clustering (Section 3); (2) Fixed versus Adjustable number of clusters (Section 4.3). There are also two suboptions for LCHC: (1) Adaptive Reference Clustering (ARC) versus Complete Cluster Pairs (CCP) (Section 4.2.2 and 4.2.1); (2) Affine Hull Distance (AHD) versus Convex Hull Distance (CHD) (Section 2). We use two single hull methods as baseline comparisons: the method using Affine Hull Distance (AHD) [5] and the method using Convex Hull Distance (CHD) [5]. Honda/UCSD [12] is used to compare different variants of the framework to choose the best one.
We use an implementation of the algorithm proposed in [14] to solve MMC optimisation problem. To eliminate the bias on large number of images in one cluster, we only select the top closest images to the reference cluster for the ARC algorithm, wherein is the number of images in the reference cluster. The clusters extracted from the query image set are used as reference clusters to adaptively cluster each gallery image set individually. In this way, the reference clusters stay the same for each query, thus the distances between the query set and each gallery set are comparable.
The best performing variant of the framework will then be contrasted against the stateoftheart approaches such as Sparse Approximated Nearest Points (SANP) [10] (a nearest point based method), Mutual Subspace Method (MSM) [23] (a subspace based method), and the Manifold Discriminant Analysis (MDA) method [20] (a multimodel based method). We obtained the implementations of all methods from the original authors. The evaluation is done in three challenging imageset datasets: Honda/UCSD [12], CMU MoBO [7] and the ETH80 [13] datasets.
6.1 Datasets
We use the Honda/UCSD and CMUMoBo datasets for face recognition tests. Honda dataset [12] consists of 59 video sequences of 20 subjects. There are pose, illumination and expression variations across the sequences for each subject. The CMUMoBo dataset [7] contains 96 motion sequences of 24 subjects with four walking patterns. As in [21], face images from each frame of both face datasets were cropped and resized to . We followed the protocol of [10, 20] to conduct 10fold cross validations on both datasets by randomly selecting one sequence for each subject for training and using the rest for testing. On the Honda/UCSD dataset, we tested on two types of image (raw and normalised via histogram equalisation), using three configurations on the number of images: randomly chosen 50, randomly chosen 100, and all images. Using a subset of images partly simulates realworld situations where a face detector or tracker may fail on some frames.
The ETH80 dataset [13] is used for object recognition tests. It contains images of 8 object categories. Each category includes 10 object subcategories (eg. various dogs), with each subcategory having 41 orientations. We resized the images to and treated each subcategory as an image set. For each category, we selected each subcategory in turn for training and the remaining 9 for testing.
6.2 Comparative Evaluation
We first evaluate the efficacy of the propose framework variants using CCP and ARC. Here, the comparisons only choose MMC for clustering using fixed number of clusters. Figure 5 illustrates the comparison of CCP and ARC on AHD and CHD methods for the Honda/UCSD dataset with 100 images randomly selected per set.
The results show that under optimal number of clusters, ARC and CCP outperforms the baseline counterparts (single hull with AHD and CHD), indicating that LCHE improves the discrimination of the system. ARC is consistently better than CCP regardless of which number of cluster is chosen. This supports our argument that ARC guarantees the meaningful comparison between local convex hulls capturing similar variations. As ARC performs the best in all tests, we thus only use ARC for local convex hulls comparison in the following evaluations.
The evaluation results of different variants for local convex hulls extraction are shown in Table 1. We only show the hyperparameters which give the best performance for each variant (\eg the best number of cluster is shown in bracket for fixed cluster number and the best threshold value is shown in square bracket for adjustable cluster number). From this table, it is clear that all the proposed variants outperform the baseline single hull approaches, validating our argument that it is helpful to use multiple local convex hulls. MMC variants outperform the means counterparts indicating that maximising the local convex hull distance for clustering leads to more discrimination ability for system. The adjustable cluster number variant achieves significant performance increase over fixed number of cluster (between 1.5% points to 5.2% points). We also note that the performance for adjustable number of clusters is not very sensitive to the threshold. For instance, the performance only drops by 2% when the threshold is set to 5000 for normalised images. In summary, MMC and adjustable number of clusters combined with ARC achieve the best performance over all.


num. of  CHD  CHD CCP  CHD ARC  
images  [5]  noc = 2  noc = 10  noc = 2  noc = 10 
50  0.23  0.67  2.79  0.73  2.32 
100  1.52  2.37  5.59  2.16  4.96 
all  89.8  26.6  28.2  25.4  25.2 

Results in Table 2 indicate that when strong noises occur in image sets, the proposed ARC approach considerably outperforms other methods, supporting our argument that ARC is helpful to remove noisy data matching. It is worthy to note that with strong noises, nearest neighbour (NN) method performs better than single hull methods.
Figure 6 is the summary of the best variant found previously contrasted with the stateoftheart methods. Normalised images are used for Honda dataset and raw images are used for ETH80 and CMUMoBo datasets. A fixed threshold is set for all datasets for adjustable number of clustering. It is clear that the proposed system consistently outperforms all other methods in all datasets regardless whether AHD or CHD are used.
In the last evaluation, we compare the time complexity between the variants. The average time cost to compare two image sets is shown in Table 3. For small number of images per set, extracting multiple local convex hulls is slower than using only single convex hull because of extra time for MMC and adaptive reference clustering. However, for large number (greater than 100) of images per set, the proposed method is about three times faster than the CHD method. That is because the number of images in each cluster is significantly reduced, leading to considerably lower time cost for local convex hulls comparisons.
7 Conclusions and Future Directions
In this paper, we have proposed a novel approach to find a balance between single hull methods and nearest neighbour method. Maximum margin clustering (MMC) is employed to extract multiple local convex hulls for each query image set. The adjustable number of clusters is controlled by restraining the average minimal middlepoint distance to constrain the region of unrealistic artificial variations. Adaptive reference clustering (ARC) is proposed to cluster the gallery image sets resembling the clusters of the query image set. Experiments on three datasets show considerable improvement over single hull methods as well as other stateoftheart approaches. Moreover, the proposed approach is faster than single convex hull based method and is more suitable for large image set comparisons.
Currently, the proposed approach is only investigated for MMC and means clustering. Other clustering methods for local convex hulls extraction, such as spectrum clustering [17] and subspace clustering [6] and their effects on various data distributions need to be investigated as well.
Acknowledgements. This research was funded by Sullivan Nicolaides Pathology, Australia and the Australian Research Council Linkage Projects Grant LP130100230. NICTA is funded by the Australian Government through the Department of Communications and the Australian Research Council through the ICT Centre of Excellence program.
References
 [1] O. Arandjelovic, G. Shakhnarovich, J. Fisher, R. Cipolla, and T. Darrell. Face recognition with image sets using manifold density divergence. In IEEE Int. Conf. Computer Vision and Pattern Recognition (CVPR), pages 581–588, 2005.
 [2] K. P. Bennett and E. J. Bredensteiner. Duality and geometry in SVM classifiers. In Proceedings of The 17th International Conference on Machine Learning, 2000.
 [3] S. Cameron. Enhancing GJK: Computing minimum and penetration distances between convex polyhedra. In International Conference on Robotics and Automation, 1997.
 [4] F. Cardinaux, C. Sanderson, and S. Bengio. User authentication via adapted statistical models of face images. IEEE Transactions on Signal Processing, 54(1):361–373, 2006.
 [5] H. Cevikalp and B. Triggs. Face recognition based on image sets. In IEEE Int. Conf. Computer Vision and Pattern Recognition (CVPR), pages 2567–2573, 2010.
 [6] E. Elhamifar and R. Vidal. Sparse subspace clustering. In IEEE Int. Conf. Computer Vision and Pattern Recognition (CVPR), pages 2790–2797, 2009.
 [7] R. Gross and J. Shi. The CMU motion of body (MoBo) database. Technical Report CMURITR0118, Robotics Institute, Pittsburgh, PA, 2001.
 [8] A. Hadid and M. Pietikäinen. From still image to videobased face recognition: An experimental analysis. In IEEE Int. Conf. Automatic Face and Gesture Recognition (AFGR), pages 813–818, 2004.
 [9] A. Hadid and M. Pietikäinen. Manifold learning for videotovideo face recognition. In Biometric ID Management and Multimodal Communication, Lecture Notes in Computer Science, volume 5707, pages 9–16, 2009.
 [10] Y. Hu, A. S. Mian, and R. Owens. Sparse approximated nearest points for image set classification. In IEEE Int. Conf. Computer Vision and Pattern Recognition (CVPR), pages 121–128, 2011.
 [11] T.K. Kim, J. Kittler, and R. Cipolla. Discriminative learning and recognition of image set classes using canonical correlations. IEEE Trans. Pattern Analysis and Machine Intelligence (PAMI), 29(6):1005–1018, 2007.
 [12] K.C. Lee, J. Ho, M.H. Yang, and D. Kriegman. Videobased face recognition using probabilistic appearance manifolds. In IEEE Int. Conf. Computer Vision and Pattern Recognition (CVPR), pages 313–320, 2003.
 [13] B. Leibe and B. Schiele. Analyzing appearance and contour based methods for object categorization. In Int. Conf. Computer Vision and Pattern Recognition (CVPR), pages 409–415, 2003.
 [14] Y.F. Li, I. W. Tsang, J. T. Kwok, and Z.H. Zhou. Tighter and convex maximum margin clustering. In Int. Conf. on Artificial Intelligence and Statistics, 2009.
 [15] S. T. Roweis and L. K. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500):2323–2326, 2000.
 [16] C. Sanderson, M. Harandi, Y. Wong, and B. C. Lovell. Combined learning of salient local descriptors and distance metrics for image set face verification. In IEEE Int. Conf. Advanced Video and SignalBased Surveillance (AVSS), pages 294–299, 2012.
 [17] J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8):888–905, 2000.
 [18] D. Steinley. Kmeans clustering: A halfcentury synthesis. British Journal of Mathematical and Statistical Psychology, 59:1–34, 2006.
 [19] H. Valizadegan and R. Jin. Generalized maximum margin clustering and unsupervised kernel learning. In Advances in Neural Information Processing Systems, pages 1417–1424. MIT Press, 2007.
 [20] R. Wang and X. Chen. Manifold discriminant analysis. In IEEE Int. Conf. Computer Vision and Pattern Recognition (CVPR), pages 429–436, 2009.
 [21] R. Wang, S. Shan, X. Chen, and W. Gao. Manifoldmanifold distance with application to face recognition based on image set. In IEEE Int. Conf. Computer Vision and Pattern Recognition (CVPR), pages 1–8, 2008.
 [22] L. Xu, J. Neufeld, B. Larson, and D. Schuurmans. Maximum margin clustering. In Advances in Neural Information Processing Systems, pages 1537–1544. MIT Press, 2004.
 [23] O. Yamaguchi, K. Fukui, and K.i. Maeda. Face recognition using temporal image sequence. In Int. Conf. Automatic Face and Gesture Recognition (AFGR), pages 318–323, 1998.
 [24] K. Zhang, I. W. Tsang, and J. T. Kwok. Maximum margin clustering made practical. In Proceedings of the 24th International Conference on Machine Learning, 2007.
 [25] B. Zhao, F. Wang, and C. Zhang. Efficient maximum margin clustering via cutting plane algorithm. In The 8th SIAM International Conference on Data Mining, 2008.
 [26] B. Zhao, F. Wang, and C. Zhang. Efficient multiclass maximum margin clustering. In Proceedings of The 25th International Conference on Machine Learning, 2008.