Ordinal Distance Metric Learning with MDS for Image Ranking
Image ranking is to rank images based on some known ranked images. In this paper, we propose an improved linear ordinal distance metric learning approach based on the linear distance metric learning model in Li et al. (2015). By decomposing the distance metric as , the problem can be cast as looking for a linear map between two sets of points in different spaces, meanwhile maintaining some data structures. The ordinal relation of the labels can be maintained via classical multidimensional scaling, a popular tool for dimension reduction in statistics. A least squares fitting term is then introduced to the cost function, which can also maintain the local data structure. The resulting model is an unconstrained problem, and can better fit the data structure. Extensive numerical results demonstrate the improvement of the new approach over the linear distance metric learning model both in speed and ranking performance.
Keywords: Image ranking; distance metric learning; classical multidimensional scaling; optimization model.
1 Introduction
Given a labeled image dataset (referred as the training set), image ranking is to find the most relevant images for a query image based on the training set. Different from binary classification and multiclassification, the labels of the training set in image ranking often have an order, for example, age. The two important and challenging aims for image ranking are as follows. The first aim is to find which class the query image belongs to, and the second is to find the most relevant images in the specific class. The first aim actually falls into ordinal regression in statistics, where different approaches have been proposed, see Gutierrez et al. (2016) for a survey on ordinal regression and Qiao (2015), Wang et al. (2017) for the recent development. However, the second aim makes image ranking different from ordinal regression since the training images having the same label with query image need to be further ranked. Therefore, a direct extension of methods for ordinal regression is not appropriate for image ranking.
As for the second aim, to find the most relevant images, a natural way is to use Euclidean distance between images to measure their dissimilarities. However, as we will show later, in most cases, Euclidean distance is not appropriate for dissimilarity. A practical way is to learn a distance metric (denoted as ) to measure the distances between images. This is referred as distance metric learning (DML). Then for a query image, the most relevant images are those with smallest distances under metric . Many DML methods have been developed for image classification and clustering tasks. For example, the SDP approach proposed by Xing et al. (2003), an online learning algorithm proposed by ShalevShwartz et al. (2004), a neighborhood component analysis (NCA) by Goldberger et al. (2004), and so on (Barhillel et al. (2003); Shen et al. (2010); Yang et al. (2007)). However, most of these methods didn’t assume the labels are ordered. Therefore, they can not be directly used for image ranking.
Recently, Li et al. (2015) firstly introduced ordinal DML for image ranking. By a carefully designed weighting factor based on ordinal labels, the ordinal relationship of the images is expected to be maintained. An alternating iterative update was proposed to solve the resulting nonlinear convex semidefinite programming model, which is basically a projected gradient algorithm.
On the other hand, multidimensional scaling (MDS) is an important method for dimension reduction, which has been widely used in signal processing, molecular conformation, psychometrics and social measurement. We refer to some monographs and surveys for more applications (Anjos and Lasserre (2012); Borg and Groenen (2005); Dattorro (2008); Dokmanic et al. (2015); Liberti et al. (2014)). The idea of classical MDS (cMDS) is to embed the given objects into a low dimensional space based on a Euclidean distance matrix. Recently, there has been great progress in MDS, such as the semismooth Newton method for nearest Euclidean distance matrix problem (Qi (2013); Qi and Yuan (2014)), the inexact smoothing Newton method for nonmetric MDS (Li and Qi (2017)), as well as the applications of MDS in nonlinear dimension reduction (Ding and Qi (2016, 2017)), binary code learning (Dai et al. (2016)), and sensor network localization (Qi et al. (2013)).
Our Contributions Note that the distance metric in DML is positive semidefinite. We represent as , where is a rectangular matrix. The first contribution of our work is that we look for instead of , which gets around of positive semidefinite constraint on . As a result, our method does not need spectral decomposition in each iteration and thus has quite low computational complexity. Moreover, if has only a few rows, the obtained is low rank. This brings new insight on distance metric. Distances between images under are basically the Euclidean distance between new points in a new space. The second contribution is that we employ cMDS to get the ideal points in the new space, whose Euclidean distances keep the ordinal relations as the labels do. In other words, cMDS is a key step to achieve the goal of maintaining the ordinal relationship of the data. The third contribution is that we propose a new ordinal DML model, which concerns ordinal relations between images and maintains local data structure. Extensive experiments are conducted on two data sets: UMIST face dataset and FGNET aging dataset. The results demonstrate the efficiency and improvement of the new approach over the linear DML model in Li et al. (2015) both in speed and ranking performance.
The organization of the paper is as follows. In Section 2, we give some preliminaries about DML model in Li et al. (2015) and cMDS. In Section 3, we propose our new approach, referred as cMDSDML approach. In Section 4, we discuss the numerical algorithm to solve the resulting unconstrained problem. In Section 5 we report the numerical results to demonstrate the efficiency of the proposed model. Final conclusions are given in Section 6.
Notations. We use to denote the space of symmetric matrices of , and to denote the space of positive semidefinite matrices of , and means . We use small bold letters to indicate vectors.
2 Preliminaries
In this section, we give a brief review on the linear DML model in Li et al. (2015) and then give some preliminaries on cMDS.
2.1 Problem Statement
Suppose is the training set, where , , are the observed data, and , , are the corresponding labels which have an order. is sample number of the training set. We need the following assumptions.
Assumption 1.
Suppose there are total different ordinal labels. Assume that the data in the training set are grouped as follows
where , and are distinct ordinal labels.
Assumption 2.
Suppose are zerocentralized, i.e., .
To rank images, the distance metric learning approach uses the distance defined by
where is positive semidefinite. The goal is then to learn an appropriate , such that the distances under metric between relevant images are small. Once is obtained, the most relevant images of a query image can be provided as those with smallest distances under . To this end, one expects to have two properties. Firstly, ordinal information needs to be preserved under , that is, for with , is small when is small. Secondly, local geometry structure of the data needs to be maintained under . That is, for with , , where is the identity matrix of size . See also Li et al. (2015).
2.2 Linear Distance Metric Learning for Ranking
As mentioned in the introduction, most DML approaches did not assume the labels are ordered. Li et al. (2015) firstly proposed a method named Linear Distance Metric Learning for Ranking (LDMLR), which dealt with ordinal labels. Below we briefly review the main idea of LDMLR.
To derive LDMLR, for each , we first specify nearest data points (under Euclidean distance) with the same label as its target neighbors. The LDMLR method is to learn a metric by solving the following nonlinear convex semidefinite programming problem:
(1) 
where
Here is a tradeoff parameter. indicates whether is one of ’s target neighbors, i.e.,
(2) 
And is a weighting factor defined as
(3) 
The first term of can be viewed as a penalty term of the distance between two data points if they have different labels. The weighting factor is used to adjust the importance of such distances. As we can see from the definition of , the larger is, the bigger is. If and have the same label, we don’t want to maximize their distances, so in this case. The second term of is trying to maintain the local structure between the images with the same label. Model (1) is a convex model, and can be solved by stateofart quadratic semidefinite programming packages, such as QSDP by Toh (2007). In Li et al. (2015), the projected gradient method is applied to solve (1), i.e., the following update is used
where denotes the projection onto .
In LDMLR, the ordinal relation of the images is maintained by introducing a weighting factor, which is calculated based on the ordinal labels. Furthermore, the local data structure can be kept by the second term in .
2.3 Classical Multidimensional Scaling (cMDS)
The aim of cMDS is to embed data in a lower dimensional space while preserving the distances between data. Given the coordinates of a set of points, namely with , it is straightforward to compute the pairwise Euclidean distances: , . The matrix is known as the (squared) Euclidean Distance Matrix (EDM) of those points. However, the inverse problem is more interesting and important. Suppose is given. The method of cMDS generates a set of coordinates that preserve the pairwise distances in . We give a short description of cMDS below. Let
(4) 
where is the identity matrix and is the (column) vector of all ones in . In literature, is known as the centralization matrix and is the doublecentralized matrix of (also the Gram matrix of because is positive semidefinite). Suppose admits the spectral decomposition:
(5) 
where are positive eigenvalues of (the rest are zero) and are the corresponding orthonormal eigenvectors. Then the following coordinates obtained by
(6) 
preserve the known distances in the sense that for all . This is the well known cMDS. We refer to Gower (1985), Schoenberg (1935), Torgerson (1952), Young and Householder (1938), Borg and Groenen (2005), and Dattorro (2008) for detailed description and generalizations of cMDS.
3 A New Approach for Ranking
In this section, we will motivate our new approach and discuss some related properties of EDM.
3.1 A New Approach
The idea of our approach is as follows. First, by decomposing , the problem reduces to looking for a linear map from the original space to a new space, denoted as . The points in the new space are referred as the embedding points corresponding to , . Then we apply cMDS to get the estimations of those embedding points, denoted as . Finally, is learned based on two sets of points and . We detail our approach in the following three steps.
Step 1. Decompose
A natural way of learning a distance metric is to decompose as , where is a rectangular matrix and is a prescribed dimension, where . The decomposition has been used in several references, see for example Sugiyama (2007), Weinberger and Saul (2009), Xiang et al. (2008). Learning instead of brings us some advantages. Firstly, it allows us to get around of the positive semidefinite constraint , resulting in an unconstrained model. Secondly, low rank structure of can be specified by choosing . Note that given a query image, it is necessary to compute distances between the query image and every training image. The time complexity of computing distances should be kept as low as possible. With a low rank , such complexity can be reduced from to . Finally, it provides us insights on the Mahalanobis distance metric A. is basically a linear map from to . The distance between and under metric can be reformulated as
(7) 
In other words, the distance between and under metric is essentially the Euclidean distance of new points and in the space .
Recall that we denote the space where lies in (i.e., ) as the original space, the space where lies in (i.e., ) as the new space, and is referred as the embedding point of . Now image ranking reduces to looking for a linear map, which maps to a proper new space such that the following properties hold.

The distances between embedding points can well reflect the corresponding ordinal labels. In other words, the Euclidean distances between embedding points with different labels should follow the order of their label differences, i.e.,

Local data structure must be maintained. That is, the Euclidean distances between a point and its target neighbors with the same label in the original space need to be maintained as much as possible in the new space. That is,
In the following, we apply cMDS to get the estimations of the embedding points in a new space, which enjoy property (i), then learn a linear mapping based on two sets of points and .
Step 2. Apply cMDS
In order to apply cMDS to get the estimations of embedding points, an EDM is needed. Note that the points with the same label can be basically viewed as one point, and further inspired by the weighting factor defined in (3), we can construct an EDM based on the ordinal labels. A trivial choice is to define by However, from numerical point of view, we can further add a parameter to to allow more flexibility. This leads to the following form of . Define as
(8) 
Under Assumption 1, let
(9) 
The following theorem shows that if is properly chosen, then is an EDM.
Theorem 1.
Let and is the smallest eigenvalue of . If , then defined by (8) is an EDM.
The proof is postponed in Section 3.2. If is not an EDM, we refer to Qi (2013), Li and Qi (2017) for more details. By applying cMDS to , we can get the estimations of embedding points in the new space.
Remark 1.
For and with , their estimations of embedding points basically collapse to one point, since . For and with , the Euclidean distance between their estimations of embedding points is . Consequently, there is
In other words, enjoy property .
Step 3. Matching Two Sets of Points
The final step is to learn based on two sets of points and to make have properties (i) and (ii). To deal with property (i), we need to match and as much as possible since already satisfy property (i). A natural statistical way is to use a least squares fitting term. To tackle property (ii), we adopt the second term of in (1), since it does a good job based on the numerical performance. Now we reach the following model
(10) 
where is defined as in (2). To allow more flexibility, we also use a scaling variable in the fitting term.
Although (10) is a nonconvex model in , the proposed approach enjoys the following good properties.

By dealing with instead, the resulting model (10) is an unconstrained problem, which allows various numerical algorithms to solve. Further, we can emphasize the low rank structure of by restricting to be a short fat matrix, i.e., .

By applying cMDS, we take into account of the ordinal information of labels, which leads us a good estimation of embedding points.

By matching with with the least squares fitting term, hopefully, the resulting embedding points will also keep property (i). Our numerical results actually verify this observation.
3.2 Proof of Theorem 1
Define as
(11) 
Then we have the following lemma.
Lemma 1.
Proof..
Suppose is an EDM generated by points . By the definition of , there is
which implies that , . Let , . Obviously, is an EDM generated by points . Conversely, suppose that is an EDM generated by points . Let , . One can show that is an EDM generated by . The proof is finished. ∎
Next, we show that is an EDM if is properly chosen.
Lemma 2.
Let and is the smallest eigenvalue of . If , then defined by (11) is an EDM.
Proof..
It is well known (Schoenberg (1935); Young and Householder (1938)) that is an EDM if and only if
Also note that
(12) 
To prove that is an EDM, we only need to show the positive semidefiniteness of . Let . Note that
It suffices to show if , then for any , there is
Obviously, is an EDM. Consequently, for any , . Further, implies that
By substituting by and noting equalities in (12), we have
It gives that
where the last inequality follows by the assumption as well as the positive semidefiniteness of . The proof is finished. ∎
The proof of Lemma 2 is inspired by Theorem 1 in Cailliez (1983). The difference is that is an EDM and is allowed to be negative in Lemma 2.
Remark 2.
Note that in cMDS, obtained from is not unique due to the eigenvalue decomposition of . However, are centralized, i.e., . The computational cost for generating is . If is large, the computational cost can be further reduced to by the following process, which is based on Lemma 1 and Lemma 2. It is easy to verify that generated by Algorithm 1 satisfy , and the corresponding EDM is defined in (8).
4 Numerical Algorithm
Problem (10) is an unconstrained nonlinear problem, and can be solved by various algorithms. Here, we choose the traditional steepest descent method with the Armijo line search. The convergence result of the steepest descent method can be found in classical optimization books, e.g. Nocedal and Wright (2006, P42). Algorithm 2 summarizes the details of our approach.

Given a training set: , and their corresponding labels .
Initialize: , .Parameters:, , , , , .

Compute the Euclidean distance matrix according to (8).

Apply cMDS to get estimations of embedding points .

Search target neighbors in the original space for each training sample , , .

Compute . If , stop; otherwise, let , go to S5.

Apply the Armijo line search to determine a steplength , where is the smallest positive integer such that the following inequality holds

Let , , go to S4.
Implementations Let , the gradient takes the following form
Computational Complexity We compare the computational complexity (mainly in multiplication and division) of Algorithm 2 with that of LDMLR, and the details are summarized in Table 1, where steps with underline indicate the iterative steps. Note that if is large, S2 can be replaced by Algorithm 1 and the computational complexity for S2 can be further reduced from to . For the iterative process S4S6, the complexity for each iteration is , where is the maximum number for the line search loop. In contrast, for LDMLR, the computational complexity in each iteration is , which is higher than that of S3S6 in Algorithm 2, no matter or .
5 Numerical Results
In this section, we present some numerical results to verify the efficiency of the proposed model. To evaluate the performance of the model, we employ the following popular procedure to assess the image ranking model. For a given dataset, we divide it into the training set and the testing set. We first learn a distance metric based on the training set, then apply it to rank each image in the testing set. Denote by the images in the testing set, here is the size of testing set. The estimated label is obtained based on the distance in the new space. We employ the popular nearest neighbor regression to obtain , which is used in Li et al. (2015), Weinberger and Saul (2009). The mean absolute error is used as a measure to evaluate the performance. Here are the true labels of test data .
We test the proposed method on the UMIST dataset (Graham and Allinson (1998)) and FGNET dataset (Lanitis (2008)). We also compare our method with the method LDMLR in Li et al. (2015). For each test problem, we repeat each experiment 50 times and report the average results. The algorithm is implemented in Matlab R2016a and is run on a computer with Intel Core 2 Duo CPU E7500 2.93GHz, RAM 2GB.
5.1 Experiments on the UMIST image dataset
The UMIST face dataset is a multiview dataset which consists of 575 images of 20 people, each covers a wide range of poses from profile to frontal views. Fig. 1 shows some examples from the UMIST dataset.
Based on the query man wearing glasses, we can label the dataset in the following way: man wearing glasses is regarded as completely relevant, which is labeled as 2 in our experiment; man not wearing glasses or woman wearing glasses is regarded as partially relevant, which is labeled as 1; woman not wearing glasses is regarded as irrelevant, which is labeled as 0. Thus, there are 225, 239 and 111 images in the three categories, respectively. The dimension of original data is 10304.
In this experiment, for LDMLR, we set iteration number and the tradeoff parameter according to Li et al. (2015). For our method, we set parameters , , , , the maximum number for line search loop is . To get an EDM in , we set parameter ( in this situation). To apply our algorithm, we first use PCA to reduce dimension as done in Li et al. (2015). When using PCA, we center the data but don’t scale the data. The final dimension is , i.e., .
Role of the Embedding Dimension and Distance Metric To see the role of the Embedding dimension and distance metric, we do the following test. We randomly select 10 images from each label for training and use the rest for testing. The images in the training set are grouped as follows. The training data are of label , are of label , and are of label . Then there are training data in total. We fix the number of target neighbors as .
2  3  5  8  10  

MAE  0.3539  0.3463  0.3498  0.3684  0.3798 
STD ã  0.0812  0.0671  0.0640  0.0762  0.0830 
t(s)  2.27  3.10  4.58  6.60  10.08 
To choose a proper embedding dimension, we tried several values for , i.e., . The preliminary results are reported in Table 2. Since is not so big, we directly apply cMDS to in S2 of Algorithm 2. The observation is that and are the best in terms of MSE. Taking visualization into account, we choose in our following test.
Then we compute the Euclidean distance between the training data . Fig. 2 shows , the Euclidean distance between and the first data , . It is observed that the distance between and is less than the distance between and . Moreover, the distance between and is bigger than the distance between and . It implies that the Euclidean distances between the original images can not be used for ranking. With embedding dimension , we apply our method to learn . After learning , the embedding points of the training data in the three dimensional space can be found, i.e., , . Fig. 3 plots the embedding points. As we can see, points highly cluster together with the same label. However, the distances between points with different labels can not be clearly seen from Fig. 3. We use the learned to measure the distances between the training data. Fig. 4 illustrates the distances between and under , i.e., , . Comparing Fig. 4 with Fig. 2, we can see that the data is much better layered with the distance than with the Euclidean distance. Hence the proposed model does preserve the ordinal relationship.
Comparison with LDMLR Now we compare with LDMLR in Li et al. (2015). First, we randomly select 10 images from each distinct label as the training data and use the rest for testing. Different values of are chosen to investigate the performance. Table 3 gives the results including MAE, STD (standard deviation), and CPU time in seconds. We can see in all cases, cMDSDML uses much less time than LDMLR, which is not surprising since our method has lower computational complexity. In terms of MAE, cMDSDML also outperforms LDMLR.
cMDSDML  LDMLR  

4  MAE  0.3488  0.4291 
STD  
t(s)  2.88  10.26  
5  MAE  0.3463  0.4676 
STD  
t(s)  3.10  10.39  
6  MAE  0.3521  0.4782 
STD  
t(s)  3.32  11.92 
Next, to evaluate the influence of dimension on the performance of our method, we increase dimension while fixing the size of the training set and the number of target neighbors . Table 4 lists the ranking results. It can be seen that as increases, the resulting MAE of both algorithms is not sensitive to . As for computing time, as increases, LDMLR obviously costs more time while CUP time for our method is fairly stable. It’s reasonable since the computational complexity of our method is proportional to while that of LDMLR is .
cMDSDML  LDMLR  

150  MAE  0.3463  0.4676 
STD  
t(s)  3.10  10.39  
200  MAE  0.3518  0.4695 
STD  
t(s)  3.18  17.21  
250  MAE  0.3545  0.4708 
STD  
t(s)  4.15  23.90 
Finally, we increase the size of the training set with fixed dimension and the number of target neighbors . We randomly select images from each distinct label for training and report results in Table 5. As increases, the performance of both methods becomes better, which is reasonable. cMDSDML achieves higher ranking performance than LDMLR. In particular, cMDSDML achieves , , improvement in MAE (MAE(LDMLR)MAE(cMDSDML)/MAE(LDMLR)) over LDMLR, respectively. Moreover, cMDSDML is also faster than LDMLR.
cMDSDML  LDMLR  

30  MAE  0.3463  0.4676 
STD  
t(s)  3.10  10.39  
60  MAE  0.1958  0.3103 
STD  
t(s)  4.94  39.13  
90  MAE  0.1373  0.2264 
STD  
t(s)  5.91  79.92 
5.2 Experiments on the FGNET dataset
In this experiment, we test our algorithm on the FGNET dataset which is labeled by age. The FGNET dataset contains 1002 face images. There are 82 subjects in total with the age ranges from 1 to 69. Fig. 5 shows some examples from the FGNET dataset. To get better performance of LDMLR, we set the iteration number and the tradeoff parameter . For our method, , , , , the embedding dimension and the maximum number for line search loop is .
We pick up subjects with age 1, 5, 9, 15, 19 and relabel them as 1, 2, 3, 4, 5. There are 27, 40, 25, 30, 23 images in the five categories, respectively. The original dimension of images is 136. As in subsection , we preprocess the data by PCA to reduce dimension to . We randomly select 8 images from each distinct label for training and set . Fig. 6 plots the embedding data points of the training data points in the three dimensional space, i.e., , . As we can see, points almost cluster together with the same label.
Next, we randomly select 10 images from each distinct label for training and use the rest for testing. That is, the size of the training set is . We also set in . We set different values for target neighbors to investigate the performance. Table 6 lists the experimental results. In the three cases, cMDSDML achieves , , improvement over LDMLR, respectively.
cMDSDML  LDMLR  

4  MAE  0.7295  1.4179 
STD  
t(s)  2.70  21.59  
5  MAE  0.7109  1.3482 
STD  
t(s)  3.11  18.37  
6  MAE  0.7095  1.3278 
STD  
t(s)  3.86  25.43 
cMDSDML  LDMLR  

40  MAE  0.7613  1.3634 
STD  
t(s)  2.34  11.13  
50  MAE  0.7377  1.3482 
STD  
t(s)  4.56  18.37  
75  MAE  0.7243  1.2551 
STD  
t(s)  29.97  40.04 
Finally, we fix the value of target neighbors . We randomly select images from each distinct label for training. The size of the training set is chosen as . See Table 7 for the results, which again verify the efficiency of the proposed model.
Overall speaking, our numerical results show that cMDSDML outperforms LDMLR significantly both in ranking performance and CPU time.
6 Conclusions
In this paper, we proposed a socalled cMDSDML approach for image ranking, which unifies the idea of classical multidimensional scaling and distance metric learning. The algorithm enjoys low computational complexity, compared with LDMLR in Li et al. (2015). Numerical results verified the efficiency of the new approach and the improvement over LDMLR.
References
 Anjos and Lasserre (2012) Anjos, M F and J B Lasserre (2012). Handbook on Semidefinite, Conic and Polynomial Optimization. Springer US.
 Barhillel et al. (2003) Barhillel, A, T Hertz, N Shental and D Weinshall (2003). Learning distance functions using equivalence relations. In Proceedings of the Twentieth International Conference on Machine Learning.
 Borg and Groenen (2005) Borg, I and P J F Groenen (2005). Modern Multidimensional Scaling. Springer.
 Dai et al. (2016) Dai, M, Z Lu, D Shen, H Wang, B Chen, X Lin, S Zhang, L Zhang and H Liu (2016). Design of (4, 8) binary code with MDS and zigzagdecodable property. Wireless Personal Communications, 89(1):1–13.
 Dattorro (2008) Dattorro, J (2008). Convex Optimization and Euclidean Distance Geometry. Meboo Publishing.
 Ding and Qi (2016) Ding, C and H D Qi (2016). Convex optimization learning of faithful Euclidean distance representations in nonlinear dimensionality reduction. Mathematical Programming, 164(1):341381.
 Ding and Qi (2017) Ding, C and H D Qi (2017). Convex Euclidean distance embedding for collaborative position localization with NLOS mitigation. Computational Optimization and Applications, 66(1):187–218.
 Dokmanic et al. (2015) Dokmanic, I, R Parhizkar, J Ranieri and M Vetterli (2015). Euclidean distance matrices: Essential theory, algorithms, and applications. IEEE Signal Processing Magazine, 32(6):12–30.
 Cailliez (1983) Cailliez, F (1983). The analytical solution of the additive constant problem. Psychometrika, 48(2):305–308.
 Goldberger et al. (2004) Goldberger, J, S T Roweis, G Hinton and R Salakhutdinov (2004). Neighbourhood components analysis. In Proceeding of the 17th International Conference on Neural Information Processing Systems.
 Gower (1985) Gower, J C (1985). Properties of Euclidean and nonEuclidean distance matrices. Linear Algebra and its Applications, 67:81–97.
 Graham and Allinson (1998) Graham, D B and N M Allinson (1998). Characterising virtual eigensignatures for general purpose face recognition. Face recognition: From theory to applications. NATO ASI Series F, Computer and Systems Sciences, 163:446–456.
 Gutierrez et al. (2016) Gutierrez, P A, M PerezOrtiz, J SanchezMonedero, F FernandezNavarro and C HervasMartinez (2016). Ordinal regression methods: survey and experimental study. IEEE Transactions on Knowledge and Data Engineering, 28(1):127–146.
 Lanitis (2008) Lanitis, A (2008). Comparative evaluation of automatic age progression methodologies. EURASIP Journal on Advances in Signal Processing, 2008:1–10.
 Li et al. (2015) Li, C, Q Liu, J Liu and H Lu (2015). Ordinal distance metric learning for image ranking. IEEE Transactions on Neural Networks and Learning Systems, 26(7):1551–1559.
 Li and Qi (2017) Li, Q and H D Qi (2017). An inexact smoothing newton method for Euclidean distance matrix optimization under ordinal constraints. Journal of Computational Mathematics, 35(4):467–483.
 Liberti et al. (2014) Liberti, L, C Lavor, N Maculan and A Mucherino (2014). Euclidean distance geometry and applications. SIAM Review, 56(1):3–69.
 Nocedal and Wright (2006) Nocedal, J and S J Wright (2006). Numerical Optimization. Springer New York.
 Qi (2013) Qi, H D (2013). A semismooth newton method for the nearest Euclidean distance matrix problem. SIAM Journal on Matrix Analysis and Applications, 34(1):67–93.
 Qi et al. (2013) Qi, H D, N Xiu and X Yuan (2013). A lagrangian dual approach to the singlesource localization problem. IEEE Transactions on Signal Processing, 61(15):3815–3826.
 Qi and Yuan (2014) Qi, H D and X Yuan (2014). Computing the nearest Euclidean distance matrix with low embedding dimensions. Mathematical Programming, 147:351–389.
 Qiao (2015) Qiao, X (2015). Noncrossing ordinal classification. Statistics.
 Schoenberg (1935) Schoenberg, I J (1935). Remarks to maurice frechet’s article sur la definition axiomatique d’une classe d’espace distances vectoriellement applicable sur l’espace de hilbert. Annals of Mathematics, 36(3):724–732.
 ShalevShwartz et al. (2004) ShalevShwartz, S, Y Singer and A Y Ng (2004). Online and batch learning of pseudometrics. In Proceedings of the Twentyfirst international conference on Machine learning.
 Shen et al. (2010) Shen, C, J Kim and L Wang (2010). Scalable largemargin mahalanobis distance metric learning. IEEE Transactions on Neural Networks, 21(9):1524–1530.
 Sugiyama (2007) Sugiyama, M (2007). Dimensionality reduction of multimodal labeled data by local fisher discriminant analysis. Journal of Machine Learning Research, 8(1):1027–1061.
 Toh (2007) Toh, K C (2007). An inexact primaldual pathfollowing algorithm for convex quadratic SDP. Mathematical Programming, 112:221–254.
 Torgerson (1952) Torgerson, W S (1952). Multidimensional scaling: I. theory and method. Psychometrika, 17(4):401–419.
 Wang et al. (2017) Wang, H, Y Shi, L Niu and Y Tian (2017). Nonparallel support vector ordinal regression. IEEE Transactions on Cybernetics, 47(10):3306–3317.
 Weinberger and Saul (2009) Weinberger, K Q and L K Saul (2009). Distance metric learning for large margin nearest neighbor classification. Journal of Machine Learning Research, 10(1):207–244.
 Xiang et al. (2008) Xiang, S, F Nie and C Zhang (2008). Learning a mahalanobis distance metric for data clustering and classification. Pattern Recognition, 41(12):3600–3612.
 Xing et al. (2003) Xing, E P, A Y Ng, M I Jordan and S Russell (2003). Distance metric learning, with application to clustering with sideinformation. In Proceedings of the Conference on Neural Information Processing Systems.
 Yang et al. (2007) Yang, L, R Jin and R Sukthankar (2007). Bayesian active distance metric learning. In Proceedings of the TwentyThird Conference on Uncertainty in Artificial Intelligence.
 Young and Householder (1938) Young, G and A S Householder (1938). Discussion of a set of points in terms of their mutual distances. Psychometrika, 3(1):19–22.