Long-Term Ensemble Learning of Visual Place Classifiers
This paper addresses the problem of cross-season visual place classification (VPC) from a novel perspective of long-term map learning. Our goal is to enable transfer learning efficiently from one season to the next, at a small constant cost, and without wasting the robot’s available long-term-memory by memorizing very large amounts of training data. To realize a good tradeoff between generalization and specialization abilities, we employ an ensemble of convolutional neural network (DCN) classifiers and consider the task of scheduling (when and which classifiers to retrain), given a previous season’s DCN classifiers as the sole prior knowledge. We present a unified framework for retraining scheduling and discuss practical implementation strategies. Furthermore, we address the task of partitioning a robot’s workspace into places to define place classes in an unsupervised manner, rather than using uniform partitioning, so as to maximize VPC performance. Experiments using the publicly available NCLT dataset revealed that retraining scheduling of a DCN classifier ensemble is crucial and performance is significantly increased by using planned scheduling.
This paper addresses the problem of visual robot self-localization from a novel perspective of long-term map learning. We follow the recent self-localization paradigm based on a deep convolutional neural network (DCN) . Thus, an environment map is learned as a DCN-based visual place classifier, and which is used to classify a query image into one of the learned place classes. We address the difficult long-term scenario of visual place classification (VPC), termed cross-season VPC , where training and test images involve different seasons. One of most basic schemes to handle this difficulty, is to train a DCN classifier from all available training images. However, this requires a robot to explicitly memorize and learn a number of training images proportional to the number of places and seasons, which severely limits the scalability of the algorithm in both time and memory space.
Our goal is to develop a long-term map learning framework that enables efficient retraining of the VPC system, at a small constant cost, and without wasting the robot’s available long-term-memory by memorizing very large amounts of training data. This study is inspired by recent progress in domain adaptation and transfer learning [3, 4, 5, 6, 7, 8, 9], where the aim is to learn a classifier model for a target domain by exploiting rich information present in a source domain. In our study, classifiers learned in previous seasons represent the source knowledge, and we aim to exploit the source knowledge to improve the current season’s VPC performance. We follow the literature of domain adaptation and transfer learning, although a key difference is that in our application scenario of autonomous robotics, the definitions of place classes and domains are not provided, and the robot must discover their optimal definitions in an unsupervised manner.
To obtain an acceptable tradeoff between generalization and specialization, it is crucial to adequately train and retrain DCN classifiers (Fig. 1). Thus, if a DCN classifier is retrained (i.e., fine-tuned) to a specific season’s training data, its specialization ability is expected to increase, but its generalization ability tends to decrease. Thus, we have two possible choices: to either retrain a specific DCN classifier with a specific training set or not. After collecting different training sets, there are possible choices and possible DCN classifiers. Training and using this exponential number of DCN classifiers is often intractable. We suggest a solution based on ensemble learning that requires only a fixed set of classifiers that integrate information from multiple domains using fine-tuning and classifier fusion.
More formally, we address two different questions. The first question is how to choose which DCN classifiers to retrain, with the current season’s training set, out of the available DCN classifiers trained in the previous season. Recent advances in fine-tuning techniques for DCN have simplified the retraining task . However, there is no straightforward method for retraining scheduling that achieves an optimal tradeoff between VPC accuracy and training efficiency. Secondly, we address the question of how to integrate the outputs from multiple classifiers from different seasons. Because individual classifiers are trained using different amounts of training data from different seasons, they often provide conflicting classification results with different levels of variances. The key to this question is to fuse probability estimates from individual DCN classifiers. In this study, we present and evaluate several strategies for retraining scheduling and for applying classifier fusion.
An open question is how to partition the robot’s workspace into places to define place classes. Intuitively, each place class should be defined as a continuous region in the robot’s workspace with similar visual features. The main difficulty is the chicken and egg problem: If we have a well-trained classifier, it is rather easy to partition the robot’s workspace into place regions, but the training of a classifier requires a set of pre-defined place classes. Optimal definition of places is a practical concern, as the definition of place classes strongly influences VPC performance. It simplifies the inference over the space of robot pose and enables efficient self-localization, by using non-uniform planned partitioning of the space, as opposed to typical uniform partitioning. From a broader perspective, optimal place definition is interesting because it may facilitate a unifying framework for compact map representation. In this study, we present several strategies for the place-definition and workspace-partitioning discovery.
The main contribution of this study is an extension of the VPC framework to setups with long-term map learning. This study is inspired by our previous studies on cross-season self-localization  and DCN-based localization ; however, a key novelty is the formulation of cross-season VPC. The optimal definition of place classes is inspired by our previous study in ; however, the additional problem of domain adaptation between seasons arises in the long-term map learning scenario. We address this important issue and present a practical solution for it. We follow the literature such as  that suggests the use of the Alexnet architecture to analyze transfer learning and we focus on the problem of when and which DCN classifiers to retrain so as to maximize performance of the ensemble classifier. Our experimental results using the publicly available NCLT dataset  revealed that retraining scheduling of a DCN classifier ensemble is crucial and performance is significantly increased using planned scheduling.
Ii Related Works
End-to-end training of DCN for visual self-localization has attracted interest in recent years. In , DCN is introduced as the regressor to achieve end-to-end training of 6DOF camera relocalization using RGB and RGB-D. Very recently , the two regression approaches of random forest (RF)  and DCN are compared, and furthermore the novel task of mapping RF to a neural network is considered to achieve a good efficiency-accuracy tradeoff. On the other hand, a major limitation of regression approaches is that they require fine-grained training sets, such as images annotated with 6DOF camera poses, which severely limits their applicability to large-scale long-term map learning. One of the most similar formulations to ours is the formulation of topological localization , which has some desirable properties including map compactness and robustness against map errors.
Alexnet has been a popular tool for analyzing transfer learning. In , analysis of transfer learning, rather than achieving state-of-the-art performance, is the main focus. The reference implementation by Caffe is used in its original form, so that the analysis results will be comparable, extensible and useful to larger number of researchers. In , the authors argue that a large image dataset such as ImageNet contains much more information than officially announced, and most often such existing knowledge resources are ignored. Based on this idea, they presented a novel method for zero-shot learning (i.e., transfer learning). In , the problem of topological self-localization is addressed using fusion and binarization of DCN features. In the study, the DCN architecture is based on a pre-trained model using the ImageNet dataset, to confirm the generalization of the automatically learned features, and to demonstrate that the description power acquired by the DCN is transferable to specific datasets.
Our approach is informed by domain adaptation and transfer learning approaches, ranging from parameter adaptation, feature transformation, and metric learning, to deep learning techniques, which have been applied to wide variety of visual recognition tasks . In , a feature transformation termed marginalized denoising autoencoder (MDA) has been extended to denoise both the source and target data in such a way that the features become domain invariant and adaptation is easier. In , scalable greedy algorithms for transfer learning are presented, where the authors focus on how to select and combine sources from a large pool of data to yield good performance on a target task. In , the problem of classifier learning from only positive and unlabeled data is addressed on binary classifier (e.g., SVM), and exploit the fact that the conditional probability of a model trained on labeled and unlabeled examples is not very different from a model trained on fully labeled examples, assuming that positive examples are labeled at random. In , the problem of transfer learning is addressed in an interesting setting, where the target class has very few training examples. The authors aim to discover similar classes and transfer knowledge among them, by assuming that the classes have been organized into a fixed tree hierarchy and that the hierarchy is available or learnable.
Our study is related to the paradigm of life-long learning or open world recognition, in which knowledge is accumulated and maintained across domains. We also employ mid-level image representation provided by DCN. In , the authors present a novel region-based image representation where the Naive Bayes nearest neighbor model is applied and seamlessly integrated into a DCN. Very recently , a new region-based feature encoding is presented using multiple convolutional layers for feature extraction and saliency identification. Our approach is also related to ensemble learning of DCNs; However, use of DCN ensembles in visual self-localization has not been explored in the context of long-term map learning. In this study, we present a novel DCN ensemble approach that is specifically customized for visual place classifiers.
The long-term map learning framework consists of two alternately repeated missions (one iteration): exploration and adaptation (Fig. 2). The framework is initialized with a size one classifier set , which consists of a single DCN classifier that is obtained by pretraining a DCN using Bigdata such as ImageNet. A new classifier set is then obtained by using additional training data in each -th iteration (). In experiments, we use as the initial DCN classifier the Alexnet architecture pretrained on the ImageNet LSVRC-2012 dataset, and we consider one iteration of the two missions per season.
The exploration mission aims at robot exploration of the entire environment, while keeping track of the robot’s global position (e.g., using pose tracking and relocation), as much as possible, in order to collect mapped images that have global viewpoint information, and optionally, the collected data may be further post-processed to refine the viewpoint information by structure-from-motion  or SLAM . All the collected images that have viewpoint information are used as training data for the subsequent -th adaptation mission (See Fig. 2). We denote training data that is collected in the -th exploration as , where and respectively are an image and its viewpoint.
The adaptation mission aims to obtain a new set of DCN classifiers by fine-tuning existing DCN classifiers based on transfer learning and domain adaptation, given training data that is obtained in the latest -th exploration mission. As mentioned previously, we have a binary choice: whether a specific DCN classifier in should be fine-tuned with a specific training set or not, where there are possible DCN classifiers. We denote a new classifier that is obtained by fine-tuning an existing classifier by incorporating a new training set as . For example, if we fine-tune a DCN using training data and then the resulting DCN is further fine-tuned using , the final DCN is . We discuss the topic of retraining scheduling (i.e., the questions of when and which DCN classifiers should be fine tuned) in III-A.
The adaptation mission also involves the discovery of a new set of place classes that is suitable for VPC. Since the area covered by the robot exploration and its appearance differs among different explorations, the way of defining place classes should also differ among different environments. We discuss the topic of unsupervised place-definition and workspace-partitioning discovery in III-B.
The VPC task is a part of the exploration mission and attempts visual robot localization using the latest classifiers . The VPC task assumes no prior knowledge of the robot pose, which is a challenging self-localization scenario called global localization , although our VPC would also be useful for other scenarios, including pose tracking. Ideally one would like to use only a single classifier that has been repeatedly fine-tuned using all available training data as it is expected to be most informative among all possible DCN classifiers. However, in practice, this simple strategy turns out to yield poor VPC performance, due to overfitting and numerous false positives. Therefore we apply fusing information from an ensemble of DCN classifiers to obtain more reliable classification results. The definition of place classes can be different among different classifiers, so transform outputs from individual classifiers to a unified global map coordinate system using a fusion function. We discuss the information fusion function in III-C.
Iii-a Retraining Scheduling
Recall that the -th adaptation mission selects a subset of existing DCN classifiers , retrains (i.e., fine-tunes) each of the selected classifiers using the newly obtained training data , and then replaces one of the existing classifiers with each newly trained one. Therefore, we need to schedule which classifier to retrain and which classifier to replace, given the classifier set . Note that a DCN classifier at the -th mission can be uniquely identified by its history of retraining in the -th mission (). For simplicity, let us denote this history by a bit string where each bit represents whether the specific DCN classifier has been retrained () or not () at the -th adaptation mission with the -th training data.
In this study, we developed three different strategies for scheduling.
The first strategy, termed ST1, is based on the idea that the newest training set (acquired at the current -th season) is expected to be best suited for future missions and hence is preferentially selected for the current mission’s retraining. This strategy is represented by
The function returns the number of 1-bits in
and is used here as a lower priority objective for maximizing the number of 1-bits in .
The second strategy, termed ST2, is based on the idea that the number of fine-tuning steps for each DCN should be adequately controlled so as to achieve a good trade-off between generalization and specialization abilities. This strategy is represented by
is a pre-set integer parameter and represents the appropriate number of fine-tunings. In our experiments, we test three different values 1, 2, and 3.
The third strategy, termed ST3, is based on the idea that individual training sets are not equally important and there must be a single most useful training set, which should be preferentially selected for the current mission’s retraining. This strategy is represented by
is a pre-set vector parameter where the -th element is and is the identifier (ID) of the appropriate training set. In our experiments, we test all the different IDs (1, 2, through ).
Fig. 3 shows different settings for the scheduling strategies described above. We considered a sequence of four seasons, three different parameter settings for ST2, and four different settings for ST3.
Iii-B Unsupervised Place Definition
The unsupervised place definition is a pre-processing part of the per-classifier fine-tuning procedure, used to partition the robot’s workspace into places, so as to maximize VPC performance. A place definition algorithm takes as input a set of images and viewpoints collected by the mobile robot in the target environment. Once place classes are defined, we group images into clusters with the same place ID. Note that the place definition should occur prior to training of the classifier, and influences both training and classification performance.
We developed three different place definition strategies.
The first is location cue strategy. It partitions the sequence of images by the robot’s travel distance, and assigns each sub-sequence a place label. This strategy is robust against variations in the robot’s speed but does not take into account appearance information that is available from the DCN. Length of travel distance for each sub-sequence is pre-defined as a constant . In this study, we performed a coarse optimal discretization search among [m] , and chose , which provided a good balance between efficiency and accuracy.
The second strategy is combined location-appearance cue strategy. The basic idea is to use an intermediate layer’s response from an independent DCN as an additional cue for clustering images into place classes. We use the 6-th layer from a DCN as the visual cue, as it demonstrated excellent performance in image classification tasks in . The workspace partitioning procedure is as follows. (1) Images are represented by 4,096 dimensional 6-th layer features from the DCN. (2) These are used as input for k-means clustering to obtain image clusters. (3) The location cue is performed on each cluster to further partition the cluster into sub-clusters. For the DCN , we used the aforementioned that is pre-trained on ImageNet LSVRC-2012 dataset.
The third strategy is an incremental clustering based on location and appearance cues. We represent appearance of a place class by a keyframe with its L2-normalized 4,096 dimensional 6-th layer feature () from the DCN, and represent location of each keyframe or each mapped image by its viewing location and viewing angle with respect to the global map coordinate. The clustering algorithm begins with an empty set of place classes, and then iterates for each mapped image. During each iteration, it tries to insert the mapped image into a spatially nearest place class, whose viewing location is closest to that of the mapped image. If viewing location , viewing angle and appearance feature of the spatially nearest place class are sufficiently similar with , , , of the mapped image, such that , and , it inserts the mapped image into the class. Otherwise it creates a new place class using the mapped image as the sole member.
Iii-C Information Fusion
The information fusion function takes as input a set of classifier responses and produces a list of top- ranked place classes. We exploit the probability value returned by the last layer of each DCN classifier. The procedure begins by concatenating the top- ranked place classes from each DCN classifier, to obtain a list with length . We do not calibrate the probability distribution of individual DCN classifiers prior to the concatenation. Then, the concatenated list is sorted in the order of highest to lowest probability value and the top- ranked classes are output as the final classification result.
We evaluated the suitability of the methods presented above for long-term map learning using the NCLT dataset . The NCLT dataset is a long-term autonomy dataset for robotics research collected on the University of Michigan’s North Campus (Fig. 4). The dataset consists of omnidirectional imagery, 3D lidar, planar lidar, GPS, and odometry data, and we use the monocular images from the front-directed camera (“camera #5”) for our VPC tasks. During vehicle travel through both indoor and outdoor environments, various types of appearance changes are encountered with respect to the mapped images. These originate from the movement of people, parked cars, furniture, construction of the building, opening/closing of doors, placing/removing of posters, as well as other nuisance changes originating from illumination changes, viewpoint dependent changes of object appearances and occlusions, weather changes, falling leaves and snow. These appearance changes make our cross-season VPC task a challenging one. We repeated the long-term map learning in Fig. 2 four times (See Fig. 3), by using four datasets from four different seasons “2012/3/31,” “2012/8/04,” “2012/11/17,” and “2012/1/22” as individual training sets, and an additional set “2012/2/19” as test data for the last (i.e., 4-th) mission. We followed a standard procedure for fine-tuning. The classification function in the DCN is a softmax classifier that computes the probability of all the place classes. To fine-tune the DCN, we changed the softmax classifier using a new value equal to the number of place classes. The DCN parameters were then fine-tuned on the new training datasets. Input images were resized to 256 256. The DCN parameters were then fine-tuned on the new training datasets. Fig. 4 shows a bird’s eye view of the environment and the robot’s trajectories of the four adaptation missions.
Fig. 5 shows performance results. We conducted performance evaluations for the different UPD algorithms described in III-B: location cue strategy (“#1”), location-appearance cue strategy (“#2”), and incremental clustering strategy (“#3”). We also conducted performance evaluations for two different VPC scenarios, “fine localization” and “coarse localization”, in which allowed localization errors were set to 10 m and 20 m, respectively. We also considered a different type of test data, which is identified by “test:ex”. Unlike the default setting where the -th exploration season’s set is used as test data for the -th adaptation mission, the setting “test:ex” uses a fixed test set “2012/2/19” regardless of the mission ID (). Note that the scheduling strategy ST2 with is competitive or outperforms the other strategies for almost all missions and for both the fine and coarse localization scenarios as well as for both types of test data. As mentioned, this strategy controls fine-tuning number as close to as possible so as to achieve a good trade-off between generalization and specialization abilities. Moreover, the appropriate parameter turned out to be 1, meaning that in the case of ST2, fine-tuning should be performed only once for each DCN. The reason may be that fine-tuning more than once led to over-fitting and could not generalize well to the unseen test data. Among the other strategies, ST3 with exhibited relatively good performance. The reason may be that the single DCN trained on the specific season (“2012/3/31”) was well-suited for much of the test data considered here. From the above results, it could be concluded that the proposed framework of planned retraining scheduling combined with information fusion is effective for cross-season VPC tasks, particularly when fine-tuning number is controlled.
Figs. 7 and 8 show success and failure examples. We used strategy ST2 with for the ensemble classifier. As shown in Fig. 7, the classifier captures scene structure and discriminative characteristics of the scenes both for indoor and outdoor environments. On the other hand, failure often occurs from non-discriminative scenes as shown in Fig. 8.
Fig. 9 shows instances of unsupervised place definition. We show results for three different definition algorithms. As can be seen, the location cue strategy uniformly partitioned the robot’s trajectories into equal-length sub-trajectories (i.e., place classes). On the other hand, the location-appearance cue strategy and the incremental clustering strategy tend to group similar successive locations into the same class. These two strategies yielded the best performances and the former was slightly better than the latter in the experiments conducted (See Figs. 5 and 6).
We presented a long-term map learning framework for cross-season VPC. This framework enabled efficient transfer learning from one season to the next, at a small constant cost, and without wasting the robot’s available long-term-memory by memorizing very large amounts of training data. To realize an acceptable tradeoff between generalization and specialization abilities, we employed an ensemble of DCN classifiers and considered the task of scheduling when and which classifiers to retrain, given a previous season’s DCN classifiers as the sole prior knowledge. We also presented a unified framework and proposed practical strategies to implement retraining scheduling. Furthermore, we addressed the task of partitioning the robot’s workspace into places to define place classes in an unsupervised manner, to maximize VPC performance. Through long-term map learning and VPC experiments, we have shown that (a) the ensemble DCN classifier performs comparably or better than a single DCN classifier, and (b) retraining scheduling of DCN classifiers is crucial, to achieve a good balance between generalization and specialization.
Future work should address the map building stage. Currently, our experimental implementation assumes fine-grained viewpoint information for mapped images and future work should focus on the issue of map errors. Furthermore, visual place classifiers should be modified when viewpoint information of mapped images is incrementally updated during the long-term multi-session map building process. Adaptation of the place definition to changing environments is another important direction for future research.
-  A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems, 2012, pp. 1097–1105.
-  T. Naseer, L. Spinello, W. Burgard, and C. Stachniss, “Robust visual robot localization across seasons using network flows,” in Twenty-Eighth AAAI Conference on Artificial Intelligence, 2014.
-  J. Yosinski, J. Clune, Y. Bengio, and H. Lipson, “How transferable are features in deep neural networks?” in Advances in neural information processing systems, 2014, pp. 3320–3328.
-  K. Saenko, B. Kulis, M. Fritz, and T. Darrell, “Adapting visual category models to new domains,” Computer Vision–ECCV 2010, pp. 213–226, 2010.
-  S. Chopra, S. Balakrishnan, and R. Gopalan, “Dlid: Deep learning for domain adaptation by interpolating between domains,” in ICML workshop on challenges in representation learning, vol. 2, no. 6, 2013.
-  V. M. Patel, R. Gopalan, R. Li, and R. Chellappa, “Visual domain adaptation: A survey of recent advances,” IEEE signal processing magazine, vol. 32, no. 3, pp. 53–69, 2015.
-  I. Kuzborskij, F. Orabona, and B. Caputo, “Scalable greedy algorithms for transfer learning,” Computer Vision and Image Understanding, vol. 156, pp. 174–185, 2017.
-  C. Elkan and K. Noto, “Learning classifiers from only positive and unlabeled data,” in Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2008, pp. 213–220.
-  M. Oquab, L. Bottou, I. Laptev, and J. Sivic, “Learning and transferring mid-level image representations using convolutional neural networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 1717–1724.
-  A. Masatoshi, C. Yuuto, T. Kanji, and Y. Kentaro, “Leveraging image-based prior in cross-season place recognition,” in Robotics and Automation (ICRA), 2015 IEEE International Conference on. IEEE, 2015, pp. 5455–5461.
-  T. Kanji, “Self-localization from images with small overlap,” in Intelligent Robots and Systems (IROS), 2016 IEEE/RSJ International Conference on. IEEE, 2016, pp. 4497–4504.
-  F. Xiaoxiao, T. Kanji, I. Kouya, and H. Guoqing, “Unsupervised place discovery for visual place classification,” in Machine Vision Applications (MVA), 2017 Fifteenth IAPR International Conference on. IEEE, 2017, pp. 109–112.
-  N. Carlevaris-Bianco, A. K. Ushani, and R. M. Eustice, “University of michigan north campus long-term vision and lidar dataset,” The International Journal of Robotics Research, vol. 35, no. 9, pp. 1023–1035, 2016.
-  A. Kendall, M. Grimes, and R. Cipolla, “Posenet: A convolutional network for real-time 6-dof camera relocalization,” in Proceedings of the IEEE international conference on computer vision, 2015, pp. 2938–2946.
-  D. Massiceti, A. Krull, E. Brachmann, C. Rother, and P. H. Torr, “Random forests versus neural networks?what’s best for camera localization?” in Robotics and Automation (ICRA), 2017 IEEE International Conference on. IEEE, 2017, pp. 5118–5125.
-  J. Shotton, B. Glocker, C. Zach, S. Izadi, A. Criminisi, and A. Fitzgibbon, “Scene coordinate regression forests for camera relocalization in rgb-d images,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 2930–2937.
-  R. Arroyo, P. F. Alcantarilla, L. M. Bergasa, and E. Romera, “Are you able to perform a life-long visual topological localization?” Autonomous Robots, pp. 1–21, 2017.
-  E. Gavves, T. Mensink, T. Tommasi, C. G. Snoek, and T. Tuytelaars, “Active transfer learning with zero-shot priors: Reusing past datasets for future tasks,” in Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 2731–2739.
-  R. Arroyo, P. F. Alcantarilla, L. M. Bergasa, and E. Romera, “Fusion and binarization of cnn features for robust topological localization across seasons,” in Intelligent Robots and Systems (IROS), 2016 IEEE/RSJ International Conference on. IEEE, 2016, pp. 4656–4663.
-  G. Csurka, B. Chidlowskii, S. Clinchant, and S. Michel, “Unsupervised domain adaptation with regularized domain instance denoising,” in Computer Vision–ECCV 2016 Workshops. Springer, 2016, pp. 458–466.
-  N. Srivastava and R. R. Salakhutdinov, “Discriminative transfer learning with tree-based priors,” in Advances in Neural Information Processing Systems, 2013, pp. 2094–2102.
-  M. Mancini, S. R. Bulo, E. Ricci, and B. Caputo, “Learning deep nbnn representations for robust place categorization,” IEEE Robotics and Automation Letters, 2017.
-  Z. Chen, F. Maffra, I. Sa, and M. Chli, “Only look once, mining distinctive landmarks from convnet for visual place recognition,” in 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2017), 2017.
-  J. L. Schonberger and J.-M. Frahm, “Structure-from-motion revisited,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 4104–4113.
-  Y. Latif, C. Cadena, and J. Neira, “Robust loop closing over time for pose graph slam,” The International Journal of Robotics Research, vol. 32, no. 14, pp. 1611–1626, 2013.
-  A. Babenko, A. Slesarev, A. Chigorin, and V. Lempitsky, “Neural codes for image retrieval,” in European conference on computer vision. Springer, 2014, pp. 584–599.