Extrinsic Kernel Ridge Regression Classifier for Planar Kendall Shape Space
kernel methods have had great success in the statistics and machine learning community. Despite their growing popularity, however, less effort has been drawn towards developing kernel based classification methods on manifold due to the non-Euclidean geometry. In this paper, motivated by the extrinsic framework of manifold-valued data analysis, we propose two types of new kernels on planar Kendall shape space , called extrinsic Veronese Whitney Gaussian kernel and extrinsic complex Gaussian kernel. We show that our approach can be extended to develop Gaussian like kernels on any embedded manifold. Furthermore, kernel ridge regression classifier (KRRC) is implemented to address the shape classification problem on , and their promising performances are illustrated through the real dataset.
Classification has been one of the main subjects in the statistics and machine learning literature. A classification problem can be generally formulated as follows. Given a training data set with , and , where is a finite discrete set, we consider the following model . The goal is to construct a classifier such that the class label of the new test data can be successfully predicted by . Whereas many classification methods including support vector machines, multiclass logistic regression, K-nearest neighbors and their variants have been intensively studied on Euclidean space, much less attention has been paid to non Euclidean space. In new types of data analysis emerged in recent years, however, analyzing non-Euclidean data, mostly manifold-valued data, has attracted great interest. Examples of such data types include directions on sphere, diffusion tensor magnetic resonance imaging (DT-MRI) data, planar or 3D shapes, and medical images, where classical Euclidean approaches are unrealistic due to the certain geometrical structures of underlying spaces. Because of the restriction requiring Euclidean forms placed on predictors, the infeasibility of applying aforementioned methods to such spaces is inevitable. For instance, suffering from the difficulty to define nearest neighbors, KNN is not directly applicable for classification problems without the notion of a centroid and distance on non-Euclidean space. A number of statistical methods have emerged in an attempt to counter this difficulty in non-Euclidean data analysis. In this paper, we adopt kernel methods on the shape manifold as part of efforts to deal with the problem. In order to establish a link between kernelized methods and shape analysis, let us first briefly recall the kernelized methods.
Apart from the great surge of interest in analyzing manifold-valued data, in the other areas of statistics and machine learning, kernel methods have been successfully incorporated into a number of learning algorithms which depend only on the inner product of the data set, including ridge regression and principal component analysis (PCA), and support vector machine (SVM) (Scholköph and Smola (2002), Hofmann et al. (2008)). Two main benefits of kernel methods are identified as follows; the effectiveness of extending linear algorithms to nonlinear approaches, and the availability of applying the Euclidean algorithms to non-Euclidean space over which positive definite kernels can be defined. Without changing the learning algorithms, Kernel methods allow extending the linear methods to a nonlinear way by mapping data from the original space to a high dimensional feature space. That can be implemented by simply substituting the inner product in the original space with that of the feature space. And the inner products of the high dimensional feature space can be effectively computed in the original space via kernel. Moreover, since informally speaking a kernel function is a similarity measure between two objects , and on nonempty set and no assumption is made about , kernels can be defined on arbitrary space. Thus, with this in consideration, kernelized methods can be carried out in various domains such as sequence data, networks, graphs, text, images, as well as manifolds. And provided the kernel function is positive definite, the methods can be theoretically justified by Reproducing Kernel Hilbert Space (RKHS) theory. In this aspect, one might consider kernel methods to provide an efficient way to dealing with classification problems over non-Euclidean space.
In the field of analyzing complex structured data types, object data analysis has been growing in popularity as a new research area (Wang and Marron (2007), Marron and Alonso (2014), Patrangenaru and Ellingson (2015)). From an object data analysis point of view, the data objects such as directions, shapes, medical images, and strings could be understood as elements of non Euclidean spaces. In this perspective, landmarked shapes of configurations extracted from images could be treated as points in Kendall’s shape spaces (Kendall (1984), Dryden and Mardia (1998)), which is one of the popular manifolds in the object data and shape analysis literatures. While most of the focus on a rich literature in this field mainly has been on statistical methods based on the Riemannian metric, there have been relatively few works on kernelized shape classifications on this space (Jayasumana et al. (2013b), Lin et al. (2018)). Thus in this paper, by taking advantage of the appealing flexibility and adaptability of kernelized method we propose a new method which we call the extrinsic kernel ridge regression classifier. It’s worth keeping in mind that Gaussian like kernel on this shape space has already been proposed by Jayasumana et al. (2013b), but our work differs from the previously published work by using the extrinsic distance which is known as Euclidean distance induced by an embedding. Detailed descriptions of our approach will be provided in Section 4.
The rest of this paper is organized as follows. In Section 2, some background information and preliminaries are presented. In Section 3, we describe the KRRC with the Full Procrustes Gaussian kernel. In Section 4, we develop our extrinsic kernels which is our main contribution of this paper. In Section 5, we illustrate our proposed methods on a real dataset.
2 Regression Classifiers and Planar Kendall Shape Manifold
In this section, we present some necessary definitions and preliminary concepts that will be used throughout this paper. Let us first recall regression classifiers based on the subspace learning methods.
2.1 Regression Classifiers
Subspace learning methods for object classification have been extensively studied in the last few decades. The idea behind such methods is that random objects which belong to a specific class are assumed to lie on a linear subspace spanned by observations on that class. According to the subspace assumption, the new data which belongs to the th class can be expressed in terms of a linear combination of the training sets from the th class; , where denote the class-specific data matrix of th class, coefficient, and random error, respectively. Because the coefficient can be straightforwardly obtained through the least square method , after projecting a new sample data onto the subspaces of different classes, the final classification step can be performed by minimizing distances between the given data point and the projected points , where denotes the projected point to the th subspace. A schematic representation of the method is presented in Figure 1. One pioneering and influential subspace classification method is a linear regression type classifier (Naseem et al. (2010)), which was used in the context of face recognition. After the remarkable success of a linear regression classifier (LRC), to date, a variety of LRC based methods have been developed and achieved improved performance. Although such methods provide insight on how to apply a linear regression technique to a classification problem, LRC based methods are restricted to requiring a linear subspace assumption and having larger number of class specific samples than their dimension. To resolve this problem, He et al. (2014) proposed a kernel ridge regression classifier (KRRC) on Euclidean space via connecting a kernel ridge regression to a multiclass classification problem. Incorporating RKHS methods into LRC provides an advantage over LRC by capturing nonlinear structure of data set.
2.2 Planar Shape Space
We now briefly introduce the geometry of Kendall shape space of -ads which is the most popular landmark based shape manifold in Statistics. A shape can be defined as a geometrical object that is invariant under translation, scale and rotational effects. Consider shape space , where , and indicate the number of landmarks and Euclidean dimension where landmarks lie. The Kendall’s planar shape space of -ads () can be represented on the complex plane by the following manner.
First, -ads on the plane can be identified as a set of complex numbers where . These configurations can be mapped to the pre-shape space by filtering out the effect of translation and scaling
where , and is the pre-shape space which is equivalent to a complex hypersphere . Then the shape can be defined as the orbit of
Thus Kendall’s planar shape space can be represented as , and more detailed explanation is provided in Dryden and Mardia (1998), and Patrangenaru and Ellingson (2015). Before introducing our methods, we here describe a naive way of applying existing regression classifier to the shape space.
Suppose the training data is given, where , and , and if shapes are uncautiously treated as dimensional complex random vectors, then for the th class, class specific preshape data matrix can be constructed by stacking each preshape column-wise. Thus with the above data matrix, the naive ridge regression classifier for the shape manifold can be formulated from the class specific regression model
where , and are the regression coefficient, the new preshape, and a valued random error, respectively. Then under the ridge regression setup
and by solving the complex valued least squares problem, it is straightforward to obtain , where are the conjugate transpose of and , respectively. Let be the projected value onto the th subspace, then we finally predict the class of the given shape by minimizing the distance between the projected complex vector on each subspace of classes and the given preshape
Though on the surface it seems a reasonable way forward, in fact, the approach described above has conceptual limitations. We would like conclude this section by remarking drawbacks of this naive shape RRC. First, note that the shape manifold is a nonlinear manifold, so the linear subspace regression model in (1) is undesirable. Second, the geometry of the shape space is ignored by employing Euclidean norm for a complex vector in the first part of (2), and (3), as will be illustrated in Section 5, that might be a main cause yielding a poor estimation performance. Thus, due to the nonlinear manifold structure of the planar Kendall shape space, the RRC can not be immediately applicable for shape classification. In the following two sections, the kernelized method is adopted to address this problems.
3 KRRC with the full Procrustes Gaussian kernel
The aforementioned problems requiring nonlinearity in the model can be alleviated by exploiting the RKHS methods. Suppose the nonlinear map that maps the data values lying on the original space into a high dimensional feature space which, in fact, is a reproducing kernel Hilbert space (RKHS) of functions. Assuming the linearity in the feature space , the class specific ridge regression (2) can be extended to
The solution of (4) is then given by, . From the RKHS theory the feature space can be implicitly defined by the kernel function . Precisely, the kernel function plays a role of the inner product in the feature space by satisfying . Thus the solution of (4) can be rewritten as , where is the by gram matrix whose th element is , and is the vector of inner products in the feature space between the given training data and the new point .
For the selection of the kernel, the most popular Euclidean kernel used in various kernelized methods is the Gaussian radial basis function (RBF) kernel, which maps given data into the infinite dimensional feature space. Recall the Euclidean Gaussian RBF kernel for two given points
One might consider Gaussian RBF kernels on by substituting the Euclidean distance in (5), with a specific shape distance chosen by researchers preference. Then the Gaussian like kernel on the planar Kendall shape space takes the form of , where denotes a distance on . Some particular examples of include the arc length (Riemannian distance), the partial Procrustes distance, and the full Procrustes distance. Plugging distances might seem to be tempting to generalize Gaussian RBF Kernels on a shape manifold, as no additional effort is required when conducting KRRC on the shape space. Unfortunately, however, not all choices of distances leads to be a positive definite kernel, which is an essential requirement of RKHS methods. To the best our knowledge, the positive definite Gaussian like kernel on was firstly proposed by Jayasumana et al. (2013b) using the full Procrustes distance . The kernel referred to as the full Procrustes Gaussian (FPG) kernel is given by
where are given pre-shape of , and , respectively. Various kernelized methods for the planar Kendall shape space have been successfully implemented using the FPG kernel. Thus the FPG kernel could be considered as the first kernel that can be directly exploited for KRRC.
Besides the selection of the kernel, performance of the model depends significantly on a careful choice of tuning parameters. The KRRC involves two tuning parameters; a regularization parameter , and a kernel specific parameter that need to be determined in a data driven manner. While there is no theoretical guideline for tuning, the optimal combination of can be jointly tuned using a two-dimensional grid search.
4 Extrinsic KRRC for shape classification
As we have mentioned in the previous section, due to the non-positive definiteness, not all shape distances can not be directly adopted for kernelized methods on . One possible way to tackle this problem is adopting extrinsic approach which will be demonstrated in this section. So in what follows, instead of directly using distances defined on the shape manifold, our proposed extrinsic KRRC methods are equipped with the induced Euclidean distance. Before introducing our proposed methods, let us begin by reviewing the extrinsic data analysis on manifolds. In the literature of manifold-valued data analysis, two different types of distances have been considered. A natural choice of a distance is using the Riemannian metric on manifold of ones preference, and the other possibility is a chord distance in higher-dimensional Euclidean space induced by an embedding . The aforementioned distances for analyzing manifold-valued data lead intrinsic and extrinsic statistical approaches, respectively. Thus one might consider the Gaussian kernel with Riemannian distances. For instance, the Gaussian RBF kernel with the known Riemannian distance between two shape , is , where are pre-shapes for , respectively, seems to be attractive, however the Riemannian distance used in the above doesn’t lead to a positive definite kernel. In kernel methods, to construct an appropriate RKHS, positive definiteness of kernels are required. Motivated by this problem, our proposed Extrinsic kernels instead rely on Euclidean distance, which doesn’t suffer from non-positiveness of kernels. The basic idea of extrinsic analysis is the fact that a Riemannian manifold can be embedded in a higher-dimensional Euclidean space (Nash (1956), Whitney (1944)). In the extrinsic analysis framework, manifolds are embedded into Euclidean space over which the distance between two points can then be easily calculated via Euclidean norm. This approach makes kernel based methods on manifolds, where positive definite kernel is not directly applicable, straightforward to derive the positive definite kernel. Furthermore, the derivation methods above will provide a way of constructing positive definite kernels on other manifolds. And note that since among many possible choices of embedding , using the equivariant embedding described below is generally preferred, we will use the equivariant embedding for our kernel. The definition of equivariant embedding is described in the following.
For a lie group , the embedding is said to be equivariant embedding if the following satisfies. , such that , where denotes the general linear group which is the group of by invertible real, or complex matrices, respectively. For the planar shape space, the Veronese Whitney embedding (Kent (1992), Bhattacharya and Bhattacharya (2012)),
which maps from to , the space of self-adjoint (or Hermitian) matrices, is typically used. One can easily show the Veronese Whitney (VW) embedding is the equivariant embedding with respect to the special unitary group ,
by taking such that
In the rest of this section, we will provide two different Gaussian RBF kernel within an extrinsic framework designed to alleviate non-positive definiteness issue which is potentially inherent in Gaussian kernels with intrinsic Riemannian distances.
4.1 Extrinsic Veronese Whitney Gaussian kernel
In this section, we propose the Gaussian RBF kernel on which makes use of the induced Euclidean distance between two shapes , and via VW embedding (7). The proposed kernel which we call extrinsic Veronese Whitney Gaussian (VWG) kernel is given by,
where is the squared Euclidean distance of Hermitian matrices , and is the VW embedding in (7). And note that a kernel having the form of is positive definite for all , if and only if is negative definite function (Jayasumana et al. (2013a), see also references therein). Recall that for any nonempty set , a function is negative definite if and only if is symmetric, and , for all , and with . The following theorem confirmed that the squared extrinsic Euclidean distance is a negative definite function.
The squared extrinsic Euclidean distance function , induced by the Veronese Whitney embedding is negative definite
Combining creftypecap 1 with Theorem 4.3 in Jayasumana et al. (2013a) ensures that our proposed VW Gaussian kernel (9) is a positive definite kernel. Note that a metric space is said to negative type if holds for all , and with (Lyons, 2013). Since no assumptions are imposed on , the embedded manifolds equipped with squared extrinsic distance naturally induced by inner product is a metric space of negative type. In particular, the Euclidean embedded manifold of planar Kendall shape space is a metric space of negative type.
4.2 Extrinsic complex Gaussian kernel
In this section, we propose the KRRC with the extrinsic complex Gaussian kernel by slightly modifying the VW Gaussian kernel. First, we embed shapes into the Hermitian matrices space (7), in a manner similar to deriving the VW Gaussian kernel. Now we half-vectorize the complex Hermitian matrices, and the complex valued training matrix of the th class can be generated by stacking those vectors to form columns of matrix. The th class training matrix is given by
where vech denotes the half vectorizing operator of a matrix including the diagonal elements, which stacks the lower triangular half into a single vector of length . Since under the subspace assumption we write the complex linear regression model as , where is the embedded version of the new preshape . Moreover, the usual complex ridge regression solution is given by, .
Now we consider the complex RKHS with a feature map , then via the non-linear feature map , the model can be reformulated as . The transformed ridge regression solution and the projected values are given, respectively, by and
where is the gram matrix,
and . Since the feature space is of infinite dimension, it is impossible to compute in (11), where the row dimension is infinite. The benefit of the kernel methods is that since the feature map is implicitly defined by the kernel , functional form of doesn’t need to be explicitly known. And by using the RKHS property the kernel which is a form of an inner product of the feature vectors, can be computed in the original space. We use the complex Gaussian kernel (see Steinwart et al. (2006) and Bouboulis and Theodoridis (2010)) defined as follows
where , and
With the benefit of the kernel trick, it can be shown that the classification step which contains an infinite dimensional feature map , is equivalent to (13). Thus, without evaluating the unknown feature map , the kernel trick enables KRRC to classify shapes by computing dot products in higher dimensional spaces.
5 Real Data analysis
In this section, we illustrate our proposed methods by examining the PassifloraLeaves data set which is available at https://github.com/DanChitwood/PassifloraLeaves. The leaves of Passiflora, a botanical genus of more than 550 species of flowering plants, are remarkably different with respect to their species. Chitwood and Otoni (2016) analyzed shapes of Passiflora leaves from 40 different species which were assigned into 7 classes according to their appearance. In the PassifloraLeaves data (Figure 2), fifteen landmarks were placed at homologous positions to capture the shapes of 3,319 leaves with 7 different classes. Shape differences among groups are graphically demonstrated in Figure 3, along with their extrinsic means which can be defined by minimizing the Fréchet function; . In Kendall’s planar shape space, the extrinsic sample mean which is also known as VW mean can be obtained by the eigenvector corresponding to largest eigenvalue of the complex matrix . A more detailed description of the data can be found in Chitwood and Otoni (2017).
In addition to the two extrinsic KRRCs proposed in Section 4, we also consider KRRC with the Full Procrustes Gaussian kernel, and the Gaussian kernel with the Riemannian distance, the usual RRC, and the multiclass GLM with the ridge penalty in the glmnet package as competing methods.
For each run of simulations, we randomly split the data into training and test sets of sizes using 60% of the observations in each class, and the remaining 40%, respectively. For each of the above models, we consider subspace size , and by drawing subsamples of equal sizes from the training set. Thus the model was built based only on training subsamples and we evaluated the performance on the unseen testing samples that were not used in the training phase. For evaluation metrics the macro averaging precision, recall, scores, and average accuracy are considered which are given by , , , and , respectively, where , and denote the true positive, false positive, false negative, and true negative count for the class , respectively. For all evaluation metrics above, the larger the value of the metrics, the better the performance of the models. Classification Results with 20 replications are summarized in Table 1.
The box plots are displayed in Figure 4 to graphically illustrate the model performances in terms of score. The results presented are consistent with what we previously expected. In an intuitive sense, the unsatisfactory performance of the Euclidean distance based methods, such as RRC, and GLM, is mainly due the fact that the geometric structure of the nonlinear manifold couldn’t be taken into account in these methods. Moreover, according to the inadequate result produced by the Riemannian kernel, we would emphasize that the positive definiteness of kernel is essentially required to obtain better performance.
Because, among the methods considered, the positive definite kernel based methods (VWG, ECG, FPG) substantially outperformed the other three methods, it would still be of interest to further investigate the above positive definite kernels by comparing their performance under various settings. Therefore, for now, we compare KRRC methods based on the positive definite kernels. Comparison results between three kernels with different subspace sizes are displayed in Figure 5. Note that to ease visualization, we started our comparison at the subspace size 35, and simulations were replicated 5 times in each subspace size. In terms of all measures we consider, results of three kernels seem to be similar. However, even though the theoretical investigation of these kernels will not be discussed here in more detail, it is worth pointing out that the extrinsic KRRC with VWG kernel is empirically shown to be a preferred method by slightly outperforming the other two at every subspace size.
This paper addresses the classification problem on Kendall’s planar shape space , with a major focus on the extrinsic KRRC framework. As we have demonstrated in this paper, our approach stems from an attempt to develop a new kernel on . It is desirable to employ extrinsic approach in aiming to provide a valid kernel on manifolds, where the positive definite kernels can not be directly applicable. Simply taking the advantage of Euclidean distance induced by VW embedding, and capturing nonlinear patterns in the manifold valued data, the combination of the extrinsic approach and the kernel method not only guarantees constructing the positive definite kernel, but also achieves promising performance.
We would like to conclude this paper by indicating the potential direction of future work. While the proposed extrinsic kernels has focused only on the Kendall shape space, our approach can be extended in a natural way to other manifolds where well defined embedding into Euclidean space is available. Examples of such manifolds include the 3D projective shape space of -ads , manifold of symmetric positive definite matrices and Grassmannian. We eventually expect that without suffering from the non-positive definiteness problem associated with manifolds based Gaussian kernels, our extrinsic kernel approach will contribute to the development of new ways of kernelized methods on manifolds. The extrinsic methods have recently received significant attention in analyzing manifold valued data, due to their computational efficiency
- Bhattacharya, A. and Bhattacharya, R. (2012). Nonparametric Inference on Manifolds : With Applications to Shape Spaces. IMS Monograph #2. Cambridge University Press.
- Bouboulis, P. and Theodoridis, S. (2010). The complex gaussian kernel lms algorithm. 2010 International Conference on Artificial Neural Networks, pages 11–20.
- Chitwood, D. H. and Otoni, W. C. (2016). Morphometric analysis of passiflora leaves: the relationship between landmarks of the vasculature and elliptical fourier descriptors of the blade. GigaScience, 6:1–13.
- Chitwood, D. H. and Otoni, W. C. (2017). Divergent leaf shapes among passiflora species arise from a shared juvenile morphology. Plant Direct, 1:1–15.
- Dryden, I. L. and Mardia, K. V. (1998). Statistical Shape Analysis. Wiley.
- He, J., Ding, L., Jiang, L., and Ma, L. (2014). Kernel ridge regression classification. International Joint Conference on Neural Networks, pages 2263–2267.
- Hofmann, T., Scholköph, B., and smola, A. J. (2008). Kernel methods in machine learning. The Annals of Statistics, 36:1171–1220.
- Jayasumana, S., Hartley, R., Salzmann, M., Li, H., and Harandi, M. (2013a). Kernel methods on the riemannian manifold of symmetric positive definite matrices. 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 73–80.
- Jayasumana, S., Salzmann, M., Li, H., and Harandi, M. (2013b). A framework for shape analysis via hilbert space embedding. Int. Conference on Computer Vision(ICCV), pages 1249–1256.
- Kendall, D. G. (1984). Shape manifolds, procrustean metrics, and complex projective spaces. Bulletin of the London Mathematical Society, 16:81–121.
- Kent, J. T. (1992). New directions in shape analysis. In: The Art of Statistical Science. John Wiley & Sons, Ltd, Chichester.
- Lin, L., Mu, N., Cheung, P., and Dunson, D. (2018). Extrinsic gaussian processes for regression and classification on manifolds. Bayesian Analysis. doi: https://doi.org/10.1214/18-BA1135.
- Lyons, R. (2013). Distance covariance in metric spaces. The Annals of Probability, 41:3284–3305.
- Marron, J. S. and Alonso, A. M. (2014). Overview of object oriented data analysis. Biometrical Journal, 56:732–753.
- Naseem, I., Togneri, R., Member, S., IEEE, and Bennamoun, M. (2010). Linear regression for face recognition. IEEE transactions on pattern analysis and machine Intelligence, 32:2106–2112.
- Nash, J. (1956). The imbedding problem for riemannian manifolds. The Annals of Mathematics, 63:20–63.
- Patrangenaru, V. and Ellingson, L. (2015). Nonparametric Statistics on Manifolds and Their Applications to Object Data Analysis. CRC Press.
- Scholköph, B. and Smola, A. J. (2002). Learning with Kernels:Support vector machines, Regularization, Optimization, and Beyond (Adaptive Computation and Machine Learning Series. MIT Press, Cambridge, MA, USA.
- Steinwart, I., Hush, D., and Scovel, C. (2006). An explicit description of the reproducing kernel hilbert spaces of gaussian rbf kernels. IEEE Transactions on Information Theory, pages 4635–4643.
- Wang, H. and Marron, J. S. (2007). Object oriented data analysis: Sets of trees. The Annals of Statistics, 35:1849–1873.
- Whitney, H. (1944). The self-intersections of a smooth n-manifold in 2n-space. The Annals of Mathematics, 45:220–246.