Mean Local Group Average Precision (mLGAP): A New Performance Metric for Hashingbased Retrieval
Abstract
The research on hashing techniques for visual data is gaining increased attention in recent years due to the need for compact representations supporting efficient search/retrieval in largescale databases such as online images. Among many possibilities, Mean Average Precision(mAP) has emerged as the dominant performance metric for hashingbased retrieval. One glaring shortcoming of mAP is its inability in balancing retrieval accuracy and utilization of hash codes: pushing a system to attain higher mAP will inevitably lead to poorer utilization of the hash codes. Poor utilization of the hash codes hinders good retrieval because of increased collision of samples in the hash space. This means that a model giving a higher mAP values does not necessarily do a better job in retrieval. In this paper, we introduce a new metric named Mean Local Group Average Precision (mLGAP) for better evaluation of the performance of hashingbased retrieval. The new metric provides a retrieval performance measure that also reconciles the utilization of hash codes, leading to a more practically meaningful performance metric than conventional ones like mAP. To this end, we start by mathematical analysis of the deficiencies of mAP for hashingbased retrieval. We then propose mLGAP and show why it is more appropriate for hashingbased retrieval. Experiments on image retrieval are used to demonstrate the effectiveness of the proposed metric.
1 Introduction
In recent years, due to the rapid growth of images and videos on the Internet, searching for relevant and similar images or videos from the Web has become a very practical and challenging task. Many retrieval methods, including textbased search or contentbased retrieval, have been proposed. Contentbased retrieval is desired in many scenarios since it does not require the availability of metadata (, tags or annotations). However, contentbased approaches are in general computationally costly, due to both the excessive computation needed for obtaining content descriptors and the high cost associated with searching for matches from a largescale dataset using the descriptors.
To address both the time (speed) and the space (memory requirement) complexities facing largescale applications, researchers have proposed various approximate techniques, and in particular the hashing approaches [12, 22] that map input images to very compact binary codes so that comparison/matching of any pair of images can be approximately done by using their corresponding binary hash codes, without invoking the original image or descriptors. Since all images are represented in compact binary vectors (usually shorter than a few hundreds of bits per image), not only the amount of storage/memory, but also the computational time for a comparison operation can be greatly reduced.
There are several performance metrics that are commonly used to evaluate the performance of a hashingbased retrieval system. These include Precision at samples (), Precision at Hamming distance (), and Mean Average Precision (mAP). We argue that such metrics do not provide a comprehensive measure for hashingbased retrieval performance. For example, both mAP and Precision can only indict the accuracy in retrieving relevant samples but fail to provide information on how well the hash codes are distributed. The distribution of the hash codes from a hashing scheme is of importance to the retrieval performance, since, for example, high collision (or low utilization of the hash space) is detrimental to good retrieval. In particular, pushing a system to attain higher mAP, which is currently the dominant metric used in the literature, will inevitably lead to poorer utilization of the hash space. This explains why some unsupervised approaches (,[23]) usually have a high utilization of the binary codes after hashing, but do not give good performance numbers in mAP for largescale datasets (, usually the mAP values are below with 12bit hash codes for the CIFAR10 dataset). Some supervised hashing methods take advantages of label information [25, 15, 10] to gain better retrieval results in mAP. However, with the label information used in the training stage, more samples from the same category are encoded into the same binary code, , high collision in the hash space. Even though the final performance in terms of mAP is much better (, in [15], the mAP value with 12bit hash codes for the CIFAR10 dataset is more than ) compared with unsupervised hashing approaches, it is questionable to accept this as true improvement, because of the high collision, since one may argue that, in the extreme case, hashing becomes a classification problem with only () 10 unique hashing codes for () 10 classes, and then it is no longer possible to rankorder the samples in the retrieval process.
Consequently, in general it is reasonable to argue that, for hashingbased retrieval, seeking high values in mAP would lead to failure in forming hash codes as much dispersed as possible, and thus could push the retrieval task towards a classificationlike task. This would essentially beat the purpose of doing hashing for retrieval. Therefore, it is desired to have some new performance metric that takes into consideration both retrieval accuracy and dispersion of the hashing codes.
In this work, we propose a new metric for evaluating hashingbased retrieval. This new metric is derived from conventional mAP and Precision, while keeping in mind the utilization of the hash space, and hence it has the potential of gaining the benefits of both ends. Furthermore, we will show that the new metric would not be affected by the order of the retrieved samples when the collision of hash codes happens, whereas mAP suffers from significant differences if the order of retrieved samples changes.
The main contribution of this work is threefold:

We propose a new metric that more meaningfully evaluates hashingbased retrieval for its balanced consideration of accuracy and utilization of the hash space.

We provide mathematical foundations for the new metric through analyzing deficiencies of mAP in terms of its impact on utilization of the hash space.

We demonstrate that when the samples are mapped more uniformly to the binary space, there are rooms to preserve the similarity of the samples in the visual or semantic domain.
The rest of the paper is organized as follows: In section 2 we discuss the related work. After that, we introduce some popular metrics for measuring the performance of retrieval system in Section 3. We then describe our proposed metric in Section 4. In Section 5, we demonstrate some experimental results, and end with conclusions in Section 6.
2 Related Work
Many hashing methods have been proposed for dataintensive applications in machine learning, computer vision, data mining and related areas [12, 22]. One key objective of hashing in such applications is to encode an input vector, usually highdimensional, to a compact binary vector, while preserving some similarity measure of the original data. As it is in general much more efficient to compare a pair of binary codes ( using the Hamming distance) than doing the same with the original vector data, hashing has become a useful technique for applications involving largescale datasets like image retrieval on the Internet.
Hashing techniques can be divided into two categories: dataindependent methods, and datadependent methods. One representative dataindependent approach is LocalitySensitive Hashing (LSH) [2], which uses random projections to generate hash functions. LSH has been extended to several versions, such as Kernelized LSH [8] and other variants [1, 20]. However, empirical results suggest that, LSH and other datadependent approaches like [9] usually require long bit length to maintain high precision and recall. This not only lowers the speed performance, but also increases space complexity. Hence datadependent approaches are not deemed as the best option for largescale problems like Internetscale image retrieval.
On the other hand, datadependent approaches are supposed to generate shorter hash codes, since more dataspecific information can be exploited. Since the space spanned by meaningful images is in general only a small portion of the entire vector space, by using a machinelearning strategy, it is possible to tailor a hashing scheme to cater to this space so that more compact binary codes may be obtained (i.e. the images can be represented by shorter binary hash codes). Several hashing techniques in this category, like [25, 17, 18, 16], have been proposed, reporting promising performance. These techniques can be further divided into unsupervised approaches and supervised approaches. Unsupervised approaches use unlabeled training data to learn the hash functions. Representative algorithms include PCA hasing [23], which is based on principal component analysis (PCA); Iterative Quantization (ITQ) [3], which applies orthogonal rotation matrices to tune the initial projection matrix learned by PCA; and Spectral Hashing (SH) [24], which is based on the eigenvectors computed from the data similarity graph. Deep Hashing (DH) [14] is another example, which leverages the capacity of neural networks on acquiring better visual features for improved performance.
With training data that come with label information, such as pointwise labels, pairwise labels [25, 15], or triplet labels [10], supervised methods have been developed to take advantage of the extra information. Wellknown supervised approaches include Supervised Hashing with Kernels (KSH) [17], which learns the hash function in a kernel space; Minimal Loss Hashing (MLH) [18], which minimizes a hinge loss function to learn the hash function; Binary Reconstructive Embeddings (BRE) [7], which learns hash functions by minimizing the reconstruction error between the vectors from the original space and the Hamming space; and Supervised Deep Hashing (SDH) [14], which learns the binary codes by a deep neural network. While these methods use relaxation schemes to obtain the discrete binary codes, Discrete Graph Hashing (DGH) [16] and Supervised Discrete Hashing (SDiscH) [21] were also proposed to calculate the optimal binary codes directly, and improved performance was reported.
The aforementioned approaches in general treat the feature extraction step and the binary encoding step as two different stages. Recently, some deeplearningbased approaches [25, 10, 15, 11, 14, 13] have been proposed, which attempt to learn the binary representation simultaneously with the features by using convolutional neural networks (CNNs). When the learned hash codes are used for tasks like image retrieval, these recent methods have been shown to improve the performance quite significantly. Reviewing these approaches, one may realize that the featurelearning stage is global in nature, although local information (like saliency or region interaction) may be essential for tasks like retrieving images containing foreground objects of similar type (but possibly with diverse background).
In existing studies, performance comparison among different hashing models has been based on metrics that do not necessarily reflect the advantages of the hashingbased scheme. For example, in Fig. 1, we plot the histogram of the hash codes of some existing hashing models, showing that importing label information in the objective function may lead to higher mAP scores, but does not necessary increase the utilization of the hash space, and thus may move the retrieval problem towards a classificationlike problem. This motivated us to develop a new performance metric that considers not only the retrieval accuracy but also the utilization the hash space.
3 Performance metrics for Retrieval
Let be the sample space. A good hashing scheme aims to learn a mapping , such that for any , if are similar, then are similar, where and . After learning such mapping, for a given input query , we obtain their binary representation , and compare with the other binary code of the database by using the Hamming distance .
We now describe several commonlyused metrics for image retrieval, and discuss the reason why they are not suitable for hashingbased models. Note that we only discuss scorebased metrics, and plotbased metrics like PrecisionRecall curves are not considered here.
3.1 Precision
For largescale datasets on the Internet, recall becomes a less important score of an image retrieval system, as few users will be interested in searching all images relevant to a query. On the other hand, precision is of importance, which is defined as:
(1) 
Usually, retrieval precision is measured in two ways: (1) based on the top returned samples ( ); (2) based on the samples having a Hamming distance to the query ( ). Although precision can reflect the performance of a hashing model in some degree, it does not consider the ranking order of the returned samples. For instance, for a given query, we have the same value for the following two retrieval results:
(2) 
(3) 
Hence the goal of keeping relevant images on the top as much as possible cannot be guaranteed by simply pushing for better values.
3.2 Mean Average Precision
Due to the disadvantage of using precision, the Mean Average Precision (mAP) has become a much more popular metric for measuring the performance of retrieval systems. The mAP of a sequence of queries is the mean of the average precision of each query. To be precise, we first give the definition of Average Precision (AP) of the top returns:
(4) 
where is the indicator function:
(5) 
The mAP can be calculated by taking the mean of AP of a query set :
(6) 
Therefore, the mAP scores for the two situations we presented in the previous subsection could be mAP for case 2 and mAP for case 3 respectively. mAP could be better than Precision in demonstrating the ranking order of retrieved images. For the above metrics, we call them global if all the data in the database take parts during calculation, otherwise we call them local. So and are usually used as local metrics, while mAP can be either local or global, depending on whether it is calculated on the top returns or the whole dataset.
3.2.1 Uniqueness of mAP
Even though the mAP can reflect the ranking order of retrieved images, for a hashingbased retrieval model, the mAP score of a given testing set is not unique in general. When there are collisions in one binary code ( two or more samples from different classes are mapped to the same binary code), there can be several ways to arrange the retrieval ranking order since the Hamming distances of the retrieved images are the same to the query images. The binary code collision will lead to different mAP scores for the same query image. Fig. 2 shows the best case and worst case for a twoclass example. This phenomenon suggests that mAP could be very misleading when it is used to compare two models that have very high collision rate in the hash space.
3.3 Usage of Hash code
For hashingbased approaches, to take advantage of the binary space, we would want to utilize the hash space well so as to reduce collisions, which would in turn facilitate retrieval (, supporting more refined ranking). However, the commonly used metrics mentioned above do not consider this issue of utilization of the hash space. Even worse, since high mAP means that the samples from the same class are ranked high, meaning that the distances among those samples are small or even zero, it may lead to high collision. In the extreme case, this could push the problem towards a simple classification problem, as we have discussed previously. In the following, we analyze this issue further and show why we should avoid using a global metric. We show that achieving a high score for such a metric may lead to low utilization of the hash space.
Proposition 1
For a hashingbased retrieval system, a perfect global mAP value (or other global metric) implies that the utilization the hash space is less than or equal to .
Assume that there are classes in the dataset. To achieve mAP , for any input query , the retrieval system must rank the samples in the same class the highest. So, it is natural to assume that, the samples in the classes, through the hash functions, are mapped to sets in a bit binary space separately. For each , we define two distances:
(7)  
(8) 
where denotes the diameter of , and denotes the shortest distance from to another Hamming ball. By requiring , we can achieve mAP . To prevent every samples in a class from being mapped to a single binary code, we further assume . The illustration is demonstrated in Fig. 3.
To prove that the utilization of the hash space will be less than , we have to define the orthodrome on the binary space . For any , we can flip the elements from either to or to . If we flip all the elements of one at a time, we obtain a path from to its farthest point in this space. Following the same flipping order, we obtain another path from to . An orthodrome contains all the points on this two paths. For example, if , then its farthest point will be . If we flip the sign of the elements of in the order of , then the path from to and back to can be stated as follows:
(9) 
Then these points form an orthodrome. An example of orthodrome is provided in Fig. 4. If on an orthodrome, there is a binary code belonging to class , since , its closest codes cannot come from the class other than class As this statement is true for any class , the utilization of the binary codes on this orthodrome must be less than or equal to . (See Fig. 5 for illustration). As this holds for any orthodrome, we can conclude that the utilization of the hash codes in the entire space is not greater than . Note that the upper is not the supremum ( the least upper bound), and if we decrease the number of classes and increase , which is the lower bound of the diameter of the , the upper bound for the utilization of the hash space will drop.
4 The Proposed Metric
In this section, we define our proposed metric for hashingbased retrieval system. For a given set of query , , and a set of binary codes of relevant images , . We can define the set of retrieved binary codes of certain query as , where the denotes the Hamming distance. Then the Local Group Average Precision can be calculated as follows:
(10) 
where is a penalty function, which maps the the usage of binary codes of subset to . The purpose of setting the penalty term is to encourage the dispersion of the binary codes. If for a query sample, the usage of the binary codes inside a Hamming ball is not disperse enough, the output of should be very small and get close to , if the binary codes spread out in the Hamming ball (uniformly distributed at best), the output of should be large and get close to . can be any function fulfill these requirements. In our experiments, we define as where is the summation of histogram of each binary code within a Hamming ball and is the product of the largest histogram and the number of binary codes in the Hamming ball ( the rectangular area covering all histograms). The Fig. 6 part (b) is the illustration of how to compute our penalty function. The equals to the blue shade area in the part (b) and is the total rectangular area which contains all histogram. Therefore, if the histogram is uniformly distributed in the Hamming ball ( each binary code encodes same number of images), the should equal to .
Similar to mAP, the Mean Local Group Average Precision can be calculated by taking the mean of the LGAP of the query set:
(11) 
Therefore, the for the example in Fig. 6 can be calculated as:
(12)  
(13)  
(14)  
(15) 
4.1 Advantages of uniformly distributed hash codes
Traditional hash functions are used in hash tables, where a good hash function should map the inputs as uniformly as possible to the hash table so as to minimize collision, which in turn minimizes search time. While this property remains certainly desired in hashingbased retrieval, there is another advantage for making the hash codes distributed uniformly. When hashing techniques are used for retrieval, they are often used to approximate nearest neighborhood search. We would expect that, if we have a good hash model, then for similar hash codes, the corresponding samples may have similar properties in the original domain (e.g., visual or semantic domains). For images, even if they are in the same class, some of them may be more similar than other others. Hence using the hash space as uniformly as possible can leave more room for allowing subtle distinctions of images in the hash space.
In Fig. 7 we show an visual example from our proposed objective function, which is discussed in Section 5. The retrieval system tends to map samples to different hash codes while preserving some similarity: for the given query “cat head”, the retrieved “cat head” are mainly distributed in Hamming radius . This is a clear advantage over existing popular metrics that do not consider collision.
5 Experiments
CIFAR10  CIFAR100  
12bit  24bit  12bit  24bit  
mLGAP  mAP  mLGAP  mAP  mLGAP  mAP  mLGAP  mAP  
16  0.2633  0.6012  0.2889  0.6407  0.0637  0.1514  0.0660  0.1556 
20 & 25  0.2796  0.3008  0.3461  0.3124  0.1703  0.1156  0.3310  0.1187 
We evaluate our proposed mLGAP metric on CIFAR10 and CIFAR100[6] datasets. The results show that our proposed metric has the capability of reflecting both retrieval accuracy and dispersion of hash codes, hence potentially providing a better performance measure for evaluating hashingbased retrieval.
5.1 Dataset
Both CIFAR10 and CIFAR100 datasets are used to verify the effectiveness of our proposed dispersion scheme in objective function, which will be described in details later, for compact hashing function learning and compare our results with those from several stateoftheart online hashing approaches with the proposed new mLGAP metric.
CIFAR10: The CIFAR10 dataset [6] contains 10 mutually exclusive categories with 6,000 color images in each category, in total 60,000 color images of size 32 32. Officially there are 5000 training images and 1000 testing images per class and we follow the training and testing splits to train our compact hashing codes learning approaches.
CIFAR100: The CIFAR100 dataset [6] is similar to CIFAR10 dataset, which totally contains 60,000 color images with size 32 32, but only 600 color images for each category since there are 100 categories in total. Additionally, each 5 relevant categories assemble to form a new superclass, thus 20 superclasses in total. In our experiment, we consider both ”fine” labels (the class which each image belongs to) and ”coarse” labels (the superclass which each image belongs to) in training stage and in testing stage, performance measurement is only based on ”coarse” labels.
5.2 Dispersion Scheme
In our experiments, we utilize the DSH model from [15], which achieves the stateoftheart performance on CIFAR10 dataset, to learn the hash codes. However, the DSH model is a supervised learning approach, whose disadvantages for retrieval has been discussed in Section 2. Therefore, while keeping the same network structure as the DSH model, we propose a new objective function so that the learned hash codes can be dispersed as much as possible.
The original objective function of DSH model can be presented as following:
(16) 
where
(17)  
(18)  
(19) 
if two images are from the same class and otherwise; is the margin value for calculating the dissimilarity of two binarylike codes. The first two terms are designed to make the hash codes more similar if they come from the same category and more different otherwise. The last term is the regularizer which is utilized as forming the binarylike continuous output codes close to .
We can observe that, except the regularizer, the other two terms use L2 Norm to compute the distance between binarylike codes. However, calculating the Hamming distance is similar to the calculation the L0 Norm (that is, for any vector and having the same length, the L0 Norm equals to the number of different elements of and ). Since L1 relaxation approximates the L0 Norm better than L2 Norm, we utilize L1 instead of L2, which leads to an improvement over the original objective function.
The new objective function is also designed based on the distance in the Hamming space. In the original objective function, if two images are from the same category, then two binarylike output codes should be pushed as much close as possible ( the first term in eq. 16). This will certainly exacerbate the collision problem. Therefore, we design a buffer zone for learning the hash codes for the same category. We do not necessarily push the binarylike outputs from the same category as close as possible, but within a certain range the two binarylike outputs can be defined as the same category.
So the new objective function for training single label datasets ( CIFAR10 dataset) can be redesigned as following, factoring the aforementioned improvements:
(20) 
where
(21)  
(22)  
(23)  
(24) 
is the distance range for the binarylike codes from the same category.
For CIFAR100, which contains both classes and superclasses, we design two buffer zones for learning the binary codes. The distant range for codes from the same superclass is larger than the distant range for codes from the same class, implying that the variance of the binarylike codes from the same class should be less than that of the same superclass. Therefore, the new objective function for CIFAR100 can be written as:
(25) 
where
(26)  
(27)  
(28)  
(29)  
(30)  
(31) 
if two images are from the same superclass, and otherwise. is the two distance ranges for the same class and same superclass respectively.
We applied the new objective function with the model proposed in [15] which contains 3 convolutional layers and 2 fully connected layers. The performance on the new mLGAP metric will be presented in the following section.
5.3 Evaluation and Parameter Settings
In our experiments, all the margin values like , , , and are all set heuristically. In the bit binary space, we set to encourage dissimilar images at least [15] apart. Set and for the objective function in training CIFAR10 dataset. The distance range , , and are set for training with CIFAR100 dataset. All training stage for both CIFAR10 and CIFAR100 dataset contains 750 epochs and learning rate decays every 150 epochs from 0.01 at the beginning. The model is trained and updated by batches with batch size. Fig. 8 shows the usage of 12bit binary codes of the hashing model on CIFAR10. The final performance is listed in Tab. 1, suggesting that the new metric reflects not only the performance but also the uniformity of usage of the binary codes.
6 Conclusions
In this paper we discussed the commonlyused metrics for hashingbased retrieval systems. We present a new metric which not only consider the retrieval accuracy but also the utilization of the hash space, hence providing a better performance measure of evaluating hashingbased retrieval. We show, by experiments, when the samples are mapped more uniformly to the binary space, the hash code can preserve some similarity of the original samples, in the sense of either visual or semantic meaning.
References
 [1] M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokni. Localitysensitive hashing scheme based on pstable distributions. In Proceedings of the Twentieth Annual Symposium on Computational Geometry, SCG ’04, pages 253–262, New York, NY, USA, 2004. ACM.
 [2] A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. In Proceedings of the 25th International Conference on Very Large Data Bases, VLDB ’99, pages 518–529, San Francisco, CA, USA, 1999. Morgan Kaufmann Publishers Inc.
 [3] Y. Gong and S. Lazebnik. Iterative quantization: A procrustean approach to learning binary codes. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pages 817–824, June 2011.
 [4] J. P. Heo, Y. Lee, J. He, S. F. Chang, and S. E. Yoon. Spherical hashing. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 2957–2964, June 2012.
 [5] Z. Jin, C. Li, Y. Lin, and D. Cai. Density sensitive hashing. IEEE transactions on cybernetics, 44(8):1362–1371, 2014.
 [6] A. Krizhevsky. Learning multiple layers of features from tiny images. Technical report, 2009.
 [7] B. Kulis and T. Darrell. Learning to hash with binary reconstructive embeddings. In Y. Bengio, D. Schuurmans, J. D. Lafferty, C. K. I. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems 22, pages 1042–1050. Curran Associates, Inc., 2009.
 [8] B. Kulis and K. Grauman. Kernelized localitysensitive hashing for scalable image search. In 2009 IEEE 12th International Conference on Computer Vision, pages 2130–2137, Sept 2009.
 [9] B. Kulis, P. Jain, and K. Grauman. Fast similarity search for learned metrics. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(12):2143–2157, Dec 2009.
 [10] H. Lai, Y. Pan, Y. Liu, and S. Yan. Simultaneous feature learning and hash coding with deep neural networks. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3270–3278, June 2015.
 [11] W. Li, S. Wang, and W. Kang. Feature learning based deep supervised hashing with pairwise labels. CoRR, abs/1511.03855, 2015.
 [12] Y. Li, R. Wang, H. Liu, H. Jiang, S. Shan, and X. Chen. Two birds, one stone: Jointly learning binary code for largescale face image retrieval and attributes prediction. In 2015 IEEE International Conference on Computer Vision (ICCV), pages 3819–3827, Dec 2015.
 [13] K. Lin, H. F. Yang, J. H. Hsiao, and C. S. Chen. Deep learning of binary hash codes for fast image retrieval. In 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 27–35, June 2015.
 [14] V. E. Liong, J. Lu, G. Wang, P. Moulin, and J. Zhou. Deep hashing for compact binary codes learning. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2475–2483, June 2015.
 [15] H. Liu, R. Wang, S. Shan, and X. Chen. Deep supervised hashing for fast image retrieval. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016.
 [16] W. Liu, C. Mu, S. Kumar, and S.F. Chang. Discrete graph hashing. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 27, pages 3419–3427. Curran Associates, Inc., 2014.
 [17] W. Liu, J. Wang, R. Ji, Y. G. Jiang, and S. F. Chang. Supervised hashing with kernels. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 2074–2081, June 2012.
 [18] M. Norouzi and D. J. Fleet. Minimal loss hashing for compact binary codes. In L. Getoor and T. Scheffer, editors, ICML, pages 353–360. Omnipress, 2011.
 [19] M. Raginsky and S. Lazebnik. Localitysensitive binary codes from shiftinvariant kernels. In Advances in neural information processing systems, pages 1509–1517, 2009.
 [20] M. Raginsky and S. Lazebnik. Localitysensitive binary codes from shiftinvariant kernels. In Y. Bengio, D. Schuurmans, J. D. Lafferty, C. K. I. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems 22, pages 1509–1517. Curran Associates, Inc., 2009.
 [21] F. Shen, C. Shen, W. Liu, and H. T. Shen. Supervised discrete hashing. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 37–45, June 2015.
 [22] J. Wang, S. Kumar, and S. F. Chang. Semisupervised hashing for scalable image retrieval. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pages 3424–3431, June 2010.
 [23] J. Wang, S. Kumar, and S. F. Chang. Semisupervised hashing for largescale search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(12):2393–2406, Dec 2012.
 [24] Y. Weiss, A. Torralba, and R. Fergus. Spectral hashing. In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors, Advances in Neural Information Processing Systems 21, pages 1753–1760. Curran Associates, Inc., 2009.
 [25] R. Xia, Y. Pan, H. Lai, C. Liu, and S. Yan. Supervised hashing for image retrieval via image representation learning. 2014.