Feature Learning by Multidimensional Scaling and its Applications in Object Recognition
We present the MDS feature learning framework, in which multidimensional scaling (MDS) is applied on high-level pairwise image distances to learn fixed-length vector representations of images. The aspects of the images that are captured by the learned features, which we call MDS features, completely depend on what kind of image distance measurement is employed. With properly selected semantics-sensitive image distances, the MDS features provide rich semantic information about the images that is not captured by other feature extraction techniques. In our work, we introduce the iterated Levenberg-Marquardt algorithm for solving MDS, and study the MDS feature learning with IMage Euclidean Distance (IMED) and Spatial Pyramid Matching (SPM) distance. We present experiments on both synthetic data and real images — the publicly accessible UIUC car image dataset. The MDS features based on SPM distance achieve exceptional performance for the car recognition task.
To represent an image by a fixed-length feature vector, there are generally two groups of approaches, often referred to as hand-designed features and feature learning, respectively. In this section, we briefly review several commonly used methods from each group, and relate the proposed MDS feature learning to these existing methods.
I-a Hand-Designed Features
Most hand-designed features, or sometimes called hand-crafted features, focus on capturing the color, texture and gradient information in the image. Generally, these features have a closed form to be computed, without looking at other images. Some popular yet simple hand-designed image features include color-histogram, wavelet transform coefficients , scale-invariant feature transform (SIFT) , color-SIFT , speeded up robust features (SURF) , histogram of oriented gradients (HOG)  and local binary patterns (LBP) . To represent an image with one fixed-length feature vector, there are generally three ways: (1) First, these features can be computed for the entire image, but the resulting feature vector will fail to embed the spatial relationship between different objects or different locations in the image. (2) Second, the image can be first uniformly divided into blocks. Then these features can be computed for each block, and can be concatenated to make a long feature vector. (3) Further, the division of the image does not have to be uniform, but can be arbitrary. We can just put random rectangular or circular masks onto the image, and compute features for each mask (or “patch”), then concatenate. To do this, the division must be consistent for all images.
The divide-and-concatenate methods will result in very large feature vectors. Given a large dataset, PCA can be used to reduce the dimensionality.
I-B Feature Learning
Feature learning has often been used as a synonym of deep learning, especially in recent years, and often refers to recent techniques such as sparse coding [30, 24], auto-encoder , convolutional neural networks , restricted Boltzmann machines , and deep Boltzmann machines . However, we believe this interpretation of feature learning is literally imprecise. Feature learning should be more generally defined as the opposite to hand-designed features — it should refer to any technique that learns a fixed-length vector representation of each image in the dataset by utilizing the pattern distribution of the entire dataset, or optimizing a target function that is defined on the entire dataset. Any technique that can generate a feature representation of each image without looking at the entire dataset should fail to fall into this category.
We further categorize existing feature learning methods into two subgroups: feature learning with raw intensities, and feature learning with hand-designed features. The proposed MDS feature learning falls into a third new subgroup: feature learning with image distance measurement.
I-B1 Feature Learning with Raw Intensities
This subgroup of methods treat the feature learning problem as a dimensionality reduction problem, where the original high-dimensional data are the image intensities, either gray-level or RGB values. Efforts on data dimensionality reduction have a long history , dating from the early work on PCA  and its nonlinear form, kernel PCA , to the recent work on sparse coding and deep learning [30, 19, 24, 23, 46, 38]. In all these methods, high dimensional data, such as an image, is represented by a low dimensional vector. Each entry of this vector describes one salient varying pattern of all images within the training set.
Assume we have a dataset , where each () is one data point. We briefly review several dimensionality reduction methods below.
PCA linearly projects vector to , where is obtained by performing eigenvector decomposition on the covariance matrix .
Kernel PCA first constructs a kernel matrix , where each entry of this matrix is obtained by evaluating the kernel function on two data points:
Then the Gram matrix is constructed as
where is the matrix with all elements equal to . Next the eigenvector decomposition problem is solved ( is eigenvector and is eigenvalue) and the projected vector is computed by
Auto-encoders first normalize all ’s to , and map them to , where is a sigmoid function. A reconstruction is computed by . The weight matrices and , and the bias vectors and are obtained by minimizing the average reconstruction error, which can be defined as either traditional square error or cross-entropy.
In PCA and kernel PCA, different entries of correspond to eigenvectors of different importance, while in auto-encoder, they are equivalently important.
These techniques have been shown effective on problems such as face recognition [43, 3] and even concept recognition . However, most of these methods require all input data to have exactly the same size. If the input is an image, then the image has to be cropped and resized to be consistent with other images in the dataset. However, cropping the image means loss of information, and resizing the image means change of aspect ratio, which will result in distorted object shapes.
I-B2 Feature Learning with Hand-Designed Features
One popular method that falls into this subgroup is the bag-of-visual-words (BOV) method [40, 10, 17]. This method first divides the image into local patches or segments the image into distinct regions, and then extracts hand-designed features for each patch/region. Rather than being concatenated, these feature vectors make an unordered set, or also referred to as “bag”. By performing clustering on the union of all those unordered sets for all training images, a visual vocabulary is established. Now the set of feature vectors previously extracted from each image can be transformed into a “word-frequency” histogram by simply counting which cluster (visual word) is assigned to each patch/region. The “word-frequency” histogram can be optionally normalized to generate the final fixed-length vector.
One extension of BOV is the Fisher Vector (FV) method [32, 33]. Rather than simply counting the word frequency, which can be viewed as the 0-order statistics, FV encodes higher order statistics (up to the second order) about the distribution of local descriptors assigned to each visual word. Another extension is the Spatial Pyramid Matching , which gives different weights to features in different image division levels, and defines an image similarity measurement using the pyramid matching kernel .
In this section, we first review the basics of MDS and its existing solutions, and then introduce our own solution — the iterated Levenberg-Marquardt algorithm (ILMA). Next, we discuss and compare some popular image distance measurement techniques in recent literature.
Ii-a Multidimensional Scaling: Problem Definition
As a statistical technique for the analysis of data similarity or dissimilarity, multidimensional scaling (MDS) has been well applied to areas such as information visualization  and surface flattening [39, 15]. Here we briefly review the basic concepts and definitions of MDS. For convenience, we will use the word “image” instead of “data” or “object” in the context, but we keep in mind that MDS is a technique for general purposes.
Suppose we have a set of images , and there is a distance measurement defined between each pair of images and . Note that is only a measurement of image dissimilarity, not necessarily a metric on set in the strict sense, since the subadditivity triangle inequality does not necessarily hold. Multidimensional scaling is the problem of representing each image by a point (vector) in a low dimensional space , such that the interpoint Euclidean distance in some sense approximates the distance between the corresponding images . In Section II-C we will discuss how to define the image distance/dissimilarity measurement. Here we focus on the mathematical definitions related to MDS.
For a pair of images and , let their low dimensional (-d) representations be and . The representation error is defined as , where denotes the -norm. The raw stress is defined as the sum-of-squares of the representation errors:
while the normalized stress (also known as Stress-1) is defined as
MDS models require the interpoint Euclidean distances to be “as equal as possible” to the image distances. Thus we can either minimize the raw stress or normalized stress. We compactly represent the image distances by an symmetric matrix with all diagonal values equal to , and represent the low dimensional vectors by an matrix . Using the raw stress as the loss function, the MDS problem can be stated as:
Ii-B Solutions for Multidimensional Scaling
There are lots of existing methods for solving Eq. (6), such as Kruskal’s iterative steepest descent approach  and de Leeuw’s iterative majorization algorithm (SMACOF) . In 2002, Williams demonstrated the connection between kernel PCA and metric MDS , thus metric MDS problems can also be solved by solving kernel PCA.
In our work, we introduce an iterative least squares solution to the MDS optimization problem. We note that in Eq. (6), the raw stress is minimized with respect to , which has entries in total. Thus, when is large, this nonlinear optimization problem becomes computationally intractable if we attempt to solve for all entries in one step. Inspired by the iterated conditional modes (ICM) method , which was developed to solve Markov random fields (MRF), we introduce the two-stage iterated Levenberg-Marquardt algorithm (ILMA). The basic idea of this algorithm is to repeatedly minimize the raw stress with respect to one while holding all other ’s fixed. For this purpose, we maintain a constraining set of the indices of the ’s to be fixed. In the initialization stage, indices of all images are selected into the constraining set in a random order. In the adjustment stage, we repeatedly adjust all ’s in a randomly permuted order. By doing so, each time we only need to minimize the raw stress with respect to variables, instead of , which greatly reduces the complexity of the problem. The subproblem can be viewed as a least squares problem, and can be solved by the standard Levenberg-Marquardt algorithm [25, 27]. Since the total raw stress is monotonically non-increasing through time, the convergence of the adjustment is guaranteed. The details of the two-stage algorithm are given in Algorithm 1. We will call the low dimensional vectors as MDS features or MDS codes in the context.
One advantage of our method is that we provide a unified framework for both MDS model training and new data encoding. In MDS model training, pairwise image distances are measured within the training set , and Algorithm 1 is applied to encode each training image to its MDS code . Now given a new image , we measure the distance from this image to all training images , and find its MDS code by:
which can be directly solved as a least squares problem using the standard Levenberg-Marquardt algorithm. We follow this practice for the training and testing of MDS models in the experiment in Section III-B.
Ii-C Image Distance Measurement
The measurement of the similarity or dissimilarity between two images is of essential significance in content-based image retrieval [12, 41]. There are some very simple forms of image distances, such as the traditional Euclidean distance on raw image intensities, and the earth mover’s distance (EMD) on image color histograms . Here, we briefly describe two popular image distance measurement methods: the IMage Euclidean Distance (IMED)  and the Spatial Pyramid Matching (SPM) distance . These distances will be evaluated in our experiment on real images in Section III-B.
The IMED is a generalized form of the traditional Euclidean distance on raw image intensities. Give two gray-level images and of the same size, the traditional Euclidean distance is defined as the square root of the sum-of-squares of intensity difference at each corresponding image location:
where denotes the intensity at row and column in image . In contrast, IMED also counts for the intensity difference at different locations, but assigns a weight to it, which is a function of the Euclidean distance of the two locations:
and is a continuous monotonically decreasing function, usually the Gaussian function. An interesting observation by Wang et al.  is that the IMED (II-C1) on two images is equivalent to the traditional Euclidean distance (8) on a blurred version of the two images. The blur operation is called standardizing transform (ST) by the authors.
Although IMED has shown promising performance on some recognition experiments in , we can see that it is still a low-level image distance measurement, based on the raw intensities, without embedding any semantic information. Another disadvantage of IMED is that it is only defined on images of the same size. We will apply MDS on IMED distances for the experiment in Section III-B, where we use Gaussian function for in Eq. (10) and set , and we call this method IMED-MDS.
Ii-C2 SPM Distance
The spatial pyramid matching (SPM)  is based on Grauman and Darrell’s work on pyramid matching kernel , which measures the similarity of two sets of feature vectors by partitioning the feature space on different levels and taking the sum of weighted histogram intersection functions. Lazebnik et al.’s spatial pyramid matching is an “orthogonal” approach — it performs pyramid matching in the 2-d image space, and uses -means for clustering in the feature space (edge points and SIFT features). With a visual vocabulary of size (number of clusters), and partition levels, spatial pyramid vectors of dimensionality are generated, and spatial pyramid matching similarities between images and are measured. Authors of  recommend parameter setting of and .
The similarity value lies in , where 1 is for most similar, and 0 for least similar. We have many ways to define image distances using the similarities, such as:
Unlike IMED, SPM distance is based on hand-designed features such as SIFT and edge points, instead of raw intensities. It models the spatial co-occurrence of different feature clusters, and thus is more semantics-sensitive. Besides, SPM distance does not require the size of images to be the same. We will apply MDS on the two SPM distances defined by Eq. (11) and Eq. (12), and we call them SPM1-MDS and SPM2-MDS, respectively.
We present two experiments. The first one is on synthetic data, and is to evaluate the running time performance of different MDS algorithms, and to compare different initialization strategies of our iterated Levenberg-Marquardt algorithm. The second one is a real image object recognition task, in which we compare MDS features with PCA features and kernel PCA features. In the second experiment, we use the UIUC car dataset111http://cogcomp.cs.illinois.edu/Data/Car/, and follow a five-fold cross validation to report the classification precision and recall under different feature dimensions.
Iii-a Synthetic Data Experiment
In this experiment, we use MDS for curved surface flattening  on the manually created Swiss roll data, which was introduced in , and is known to be complicated due to the highly non-linear and non-Euclidean structure . The Swiss roll surface contains points in , as shown in Fig. 1. We measure the pairwise interpoint geodesic distances to construct a distance matrix, and re-embed the Swiss roll surface into by applying MDS on the geodesic distance matrix.
Iii-A1 Running Time
First, we would like to evaluate the running time performance of the proposed iterated Levenberg-Marquardt algorithm and compare with Bronstein’s implementation of the SMACOF algorithm and its variants, including SMACOF with reduced rank extrapolation (RRE) and SMACOF with multigrid [7, 35, 34, 6]. The results are given in Fig. 2, where each number in this plot is averaged on 20 independent repeated experiments, and the running time is reported on a Mac Pro with 2 2.4 GHz Quad-Core Intel Xeon CPU. From Fig. 2, we can see that our ILMA is an efficient solution, which runs faster and converges to a smaller raw stress value than other methods. The unrolled surfaces by ILMA in different iterations are shown in Fig. 4.
Iii-A2 Initialization Strategies
Further, we study some modifications to Algorithm 1. The original algorithm uses a random order strategy in the initialization stage, but we can modify it to:
Largest-distance-first strategy: For Algorithm 1, in line 2 we choose the largest non-diagonal entry in instead of a random one; in line 7, we find the and that maximize rather than a random .
Smallest-distance-first strategy: For Algorithm 1, in line 2 we choose the smallest non-diagonal entry in ; in line 7, we find the and that minimize .
If we assume that the data to be encoded are comprised of clusters, then an intuitive interpretation of the largest-distance-first strategy is that representatives of each cluster are first encoded, and they are expected to be scattered in the multidimensional space; similarly, the smallest-distance-first strategy encodes all data in one cluster first, and then moves to the nearest cluster.
We have been using the three initialization strategies to solve the MDS problem on the Swiss roll geodesic distance matrix, and it turns out that the random order strategy converges faster than the other two, as shown in Fig. 3. Again, each number in this plot is averaged on 20 independent repeated experiments.
Iii-B Car Recognition Experiment
Now we would like to compare the performance of MDS features to the most standard and popular dimensionality reduction algorithms — PCA  and kernel PCA  on raw pixel intensities. We use the UIUC car image dataset , which contains 550 car and 500 non-car gray-level images of size (Fig. 5). We can observe that all car images are side-view images, but can be either side, and can be partly occluded. We divide the total of 1050 images into five subsets, each containing 110 car images and 100 non-car images, and each time we use four subsets as training set and one as testing set. We use the following methods to generate fixed-length feature vectors for the images:
PCA We represent each gray-level image by a 4000-d vector, and perform standard PCA on such vectors of the training set to get eigenvectors and low dimensional representations of the training images. Then we use the eigenvectors to get the low dimensional representations of the testing images.
kPCA Gaussian Similar to the above method, but we use Gaussian kernel PCA instead of standard PCA. We follow the automatic parameter selection strategy in  to determine the .
kPCA poly Similar to the above two methods, but we use third-order polynomial kernel PCA instead of standard PCA.
SPM1-MDS Similar to the above method, but we use SPM1 distance (11), instead of IMED, where the SPM parameters are and .
SPM2-MDS Similar to the above method, but we use SPM2 distance (12).
pyramid PCA Instead of computing MDS features from SPM distances, we can also directly perform PCA on the obtained -dimensional spatial pyramid vectors without measuring similarities. In our experiment, we set and , and the spatial pyramid vectors are 4200-d. Evaluating this method will allow us to observe whether the MDS on SPM distance measurement captures semantics beyond the spatial pyramids.
After we have obtained the fixed-length features of all images, we use the features of training images to learn a binary RBF kernel SVM [9, 8], and use it to classify the features of testing images. Each dimension of the feature vector is normalized to 0-mean and unit standard deviation. In the radial basis function , we set as the feature vector length. The experiment is repeated for different feature vector lengths from 1 to 20. We show the precision, recall and accuracy in Fig. 6. We also provide the feature scatter plots of different methods for feature length in Fig. 7.
In Fig. 6, we can observe that IMED-MDS method performs slightly but not significantly better than directly applying PCA or kernel PCA on raw gray-level intensities, and the superiority of IMED-MDS is more obvious when feature dimension is low. Spatial pyramid based methods do perform much better than other methods. Especially, SPM1-MDS and SPM2-MDS methods outperform all other methods, including pyramid PCA, at all feature dimensions. While the precision and recall of PCA, kernel PCA and IMED-MDS methods saturate at and respectively, the precision and recall of SPM1-MDS and SPM2-MDS saturate at and respectively. At low feature dimensions (), the accuracy of PCA and kernel PCA are very low, but the SPM1-MDS and SPM2-MDS perform almost as equally well as at very high dimensions.
In Fig. 7, we can also see that SPM1-MDS and SPM2-MDS separate car and non-car images with very clear class boundary curves in 2-d feature space.
Iv Conclusions and Future Work
In this paper, we have presented a feature learning framework by combining multidimensional scaling with image distance measurement, and compared it with a number of popular existing feature extraction techniques. To the best of our knowledge, we are the first to explore MDS on image distances such as IMage Euclidean Distance (IMED) and Spatial Pyramid Matching (SPM) distance.
We have introduced a unified framework for both MDS model training and new data encoding based on the standard Levenberg-Marquardt algorithm. Our two-stage iterated Levenberg-Marquardt algorithm for MDS model training is an efficient solution, and has shown good running time performance compared with other off-the-shelf implementations (Fig. 2).
In the car recognition experiment, we have demonstrated the power of MDS features. MDS features learned from SPM distances achieve the best classification performance on all feature dimensions. The good performance of MDS features attributes to the semantics-sensitive image distance, since it captures very different information from the images than traditional feature extraction techniques. The MDS further embeds such information into a low-dimensional feature space, which also captures the inner structure of the entire dataset. The MDS embedding is a very necessary step, since in Fig. 6 we can see the performance of MDS features learned from SPM distances is significantly better than simply running PCA on spatial pyramid vectors.
Our ongoing work on this method explores these directions:
We study more image distance measurements, such as the Integrated Region Matching (IRM) distance, which was originally designed for semantics-sensitive image retrieval systems . Performance of MDS codes learned from such distances can be evaluated and compared with the SPM-MDS method in this paper.
In Eq. (7), rather than using the entire training set, we can also use only a subset of the training images to encode new data. It would be interesting to see how the performance varies by applying different subset selection strategies and different sizes of the subset.
-  (2004-Nov.) Learning to detect objects in images via a sparse, part-based representation. IEEE Transactions on Pattern Analysis and Machine Intelligence 26 (11), pp. 1475–1490. External Links: Cited by: §III-B.
-  (2008) Speeded-up robust features (surf). Computer vision and image understanding 110 (3), pp. 346–359. Cited by: §I-A.
-  (1997-07) Eigenfaces vs. fisherfaces: recognition using class specific linear projection. IEEE Transactions on Pattern Analysis and Machine Intelligence 19 (7), pp. 711–720. External Links: Cited by: §I-B1.
-  (1986) On the Statistical Analysis of Dirty Pictures. Journal of the Royal Statistical Society. Series B (Methodological) 48 (3), pp. 259–302. External Links: Cited by: §II-B.
-  (2005) Modern Multidimensional Scaling: Theory and Applications (Springer Series in Statistics). 2nd edition, Springer. External Links: Cited by: §II-A.
-  (2008) Numerical geometry of non-rigid shapes. Springer. Cited by: §III-A1.
-  (2006) Multigrid multidimensional scaling. Numerical linear algebra with applications 13 (2-3), pp. 149–171. Cited by: §III-A1, §III-A.
-  (2011) LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, pp. 27:1–27:27. Cited by: §III-B.
-  (1995) Support-vector networks. Machine Learning 20, pp. 273–297 (English). External Links: Cited by: §III-B.
-  (2004) Visual categorization with bags of keypoints. In Workshop on statistical learning in computer vision, ECCV, Vol. 1, pp. 22. Cited by: §I-B2.
-  (2005-06) Histograms of oriented gradients for human detection. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 1, pp. 886–893. External Links: Cited by: §I-A.
-  (2008) Image retrieval: ideas, influences, and trends of the new age. ACM Computing Surveys (CSUR) 40 (2), pp. 5. Cited by: §II-C.
-  (1992) Ten lectures on wavelets. Vol. 61, SIAM. Cited by: §I-A.
-  (1977) Applications of convex analysis to multidimensional scaling. In Recent Developments in Statistics, J.R. Barra, F. Brodeau, G. Romier and B. V. Cutsem (Eds.), pp. 133–146. Cited by: §II-B.
-  (2003) On bending invariant signatures for surfaces. IEEE Transactions on Pattern Analysis and Machine Intelligence 25 (10), pp. 1285–1295. Cited by: §II-A.
-  (2004) Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. In Computer Vision and Pattern Recognition Workshop, 2004, pp. 178–178. Cited by: item 2.
-  (2005) A bayesian hierarchical model for learning natural scene categories. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005, Vol. 2, pp. 524–531. Cited by: §I-B2.
-  (2005) The pyramid match kernel: discriminative classification with sets of image features. In Tenth IEEE International Conference on Computer Vision, 2005, Vol. 2, pp. 1458–1465. Cited by: §I-B2, §II-C2.
-  (2006) Reducing the dimensionality of data with neural networks. Science 313 (5786), pp. 504–507. Cited by: §I-B1, §I-B.
-  (1964) Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika 29, pp. 1–27 (English). External Links: Cited by: §II-A, §II-B.
-  (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2006, Vol. 2, pp. 2169–2178. Cited by: §I-B2, §II-C2, §II-C.
-  (2012-06) Building high-level features using large scale unsupervised learning. In International Conference on Machine Learning, Cited by: §I-B1.
-  (1998) Gradient-based learning applied to document recognition. Proceedings of the IEEE 86 (11), pp. 2278–2324. External Links: Cited by: §I-B1, §I-B.
-  (2007) Efficient sparse coding algorithms. In Advances in Neural Information Processing Systems 19, pp. 801–808. Cited by: §I-B1, §I-B.
-  (1944) A method for the solution of certain non-linear problems in least squares. Quarterly of Applied Mathematics 2, pp. 164–â168. Cited by: §II-B.
-  (2004) Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60 (2), pp. 91–110. Cited by: §I-A.
-  (1963) An algorithm for least-squares estimation of nonlinear parameters. Journal of the Society for Industrial and Applied Mathematics 11 (2), pp. 431–441. External Links: Cited by: §II-B.
-  (1999) Kernel pca and de-noising in feature spaces. In Advances in Neural Information Processing Systems 11, pp. 536–542. Cited by: §I-B1, §III-B.
-  (1996) A comparative study of texture measures with classification based on featured distributions. Pattern recognition 29 (1), pp. 51–59. Cited by: §I-A.
-  (1996) Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381 (12), pp. 607–609. External Links: Cited by: §I-B1, §I-B.
-  (1901) On lines and planes of closest fit to systems of points in space. Philosophical Magazine 2, pp. 559–572. Cited by: §I-B1, §III-B.
-  (2007) Fisher kernels on visual vocabularies for image categorization. In IEEE Conference on Computer Vision and Pattern Recognition, 2007, pp. 1–8. Cited by: §I-B2.
-  (2010) Improving the fisher kernel for large-scale image classification. Computer Vision–ECCV 2010, pp. 143–156. Cited by: §I-B2.
-  (2008) Topologically constrained isometric embedding. In Human Motion, pp. 243–262. Cited by: §III-A1.
-  (2008) Fast multidimensional scaling using vector extrapolation. SIAM J. Sci. Comput 2. Cited by: §III-A1.
-  (2000-dec.-22) Nonlinear dimensionality reduction by locally linear embedding. Science 290 (5500), pp. 2323–2326. External Links: Cited by: §III-A.
-  (2000) The earth mover’s distance as a metric for image retrieval. International Journal of Computer Vision 40 (2), pp. 99–121. Cited by: §II-C.
-  (2009) Deep boltzmann machines. In Proceedings of the International Conference on Artificial Intelligence and Statistics, Vol. 5, pp. 448–455. Cited by: §I-B1, §I-B.
-  (1989) A numerical solution to the generalized mapmaker’s problem: flattening nonconvex polyhedral surfaces. IEEE Transactions on Pattern Analysis and Machine Intelligence 11 (9), pp. 1005–1008. Cited by: §II-A, §III-A.
-  (2003) Video google: a text retrieval approach to object matching in videos. In Ninth IEEE International Conference on Computer Vision, 2003, pp. 1470–1477. Cited by: §I-B2.
-  (2000) Content-based image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence 22 (12), pp. 1349–1380. Cited by: §II-C.
-  (2006-07) Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence 28 (7), pp. 1088–1099. External Links: Cited by: item 2.
-  (1991-jan.) Eigenfaces for recognition. Journal of Cognitive Neuroscience 3 (1), pp. 71–86. External Links: Cited by: §I-B1.
-  (2010) Evaluating color descriptors for object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 32 (9), pp. 1582–1596. External Links: Cited by: §I-A.
-  (2009) Dimensionality Reduction: A Comparative Review. Technical report Tilburg University. Cited by: §I-B1.
-  (2008) Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th International Conference on Machine learning, pp. 1096–1103. Cited by: §I-B1, §I-B.
-  (2001-Sep.) SIMPLIcity: semantics-sensitive integrated matching for picture libraries. IEEE Transactions on Pattern Analysis and Machine Intelligence 23 (9), pp. 947–963. External Links: Cited by: item 1, item 2.
-  (2005-Aug.) On the euclidean distance of images. IEEE Transactions on Pattern Analysis and Machine Intelligence 27 (8), pp. 1334–1339. External Links: Cited by: §II-C1, §II-C1, §II-C.
-  (2012) Kernel principal component analysis and its applications in face recognition and active shape models. arXiv preprint arXiv:1207.3538. Cited by: item 2.
-  (2002) On a connection between kernel pca and metric multidimensional scaling. Machine Learning 46 (1), pp. 11–19. Cited by: §II-B.
Quan Wang Quan Wang is currently working towards his Ph.D. degree in Computer and Systems Engineering in the Department of Electrical, Computer, and Systems Engineering at Rensselaer Polytechnic Institute. He received a B.Eng. degree in Automation from Tsinghua University, Beijing, China in 2010. He worked as research intern at Siemens Corporate Research, Princeton, NJ and IBM Almaden Research Center, San Jose, CA in 2012 and 2013, respectively. His research interests include feature learning, medical image analysis, object tracking, content-based image retrieval and photographic composition.
Kim L. Boyer Dr. Kim L. Boyer is currently Head of the Department of Electrical, Computer, and Systems Engineering at Rensselaer Polytechnic Institute. He received the BSEE (with distinction), MSEE, and Ph.D. degrees, all in electrical engineering, from Purdue University in 1976, 1977, and 1986, respectively. From 1977 through 1981 he was with Bell Laboratories, Holmdel, NJ; from 1981 through 1983 he was with Comsat Laboratories, Clarksburg, MD. From 1986–2007 he was with the Department of Electrical and Computer Engineering, The Ohio State University. He is a Fellow of the IEEE, a Fellow of IAPR, a former IEEE Computer Society Distinguished Speaker, and currently the IAPR President. Dr. Boyer is also a National Academies Jefferson Science Fellow at the US Department of State, spending 2006–2007 as Senior Science Advisor to the Bureau of Western Hemisphere Affairs. He retains his Fellowship as a consultant on science and technology policy for the State Department.