Discrete Hashing with Deep Neural Network
Abstract
This paper addresses the problem of learning binary hash codes for large scale image search by proposing a novel hashing method based on deep neural network. The advantage of our deep model over previous deep model used in hashing is that our model contains necessary criteria for producing good codes such as similarity preserving, balance and independence. Another advantage of our method is that instead of relaxing the binary constraint of codes during the learning process as most previous works, in this paper, by introducing the auxiliary variable, we reformulate the optimization into two suboptimization steps allowing us to efficiently solve binary constraints without any relaxation.
The proposed method is also extended to the supervised hashing by leveraging the label information such that the learned binary codes preserve the pairwise label of inputs.
The experimental results on three benchmark datasets show the proposed methods outperform stateoftheart hashing methods.
1 Introduction
Large scale visual search has attracted attention because of easy availability of huge amounts of data also its wide applications [3]. Two main difficulties when dealing with large scale visual search are efficient storage and fast searching. An attractive approach for handling those difficulties is binary hashing where each original high dimensional vector is mapped to a binary low dimensional vector where . The resulted binary vectors will allow the efficient storage. Furthermore, while the searching in original space costs where is database size, the searching in binary space costs with much smaller constant factor. This is because the hardware can efficiently compute the distance between data points in binary space (e.g. using XOR operator) and the entire dataset ( bits) can fit in the main memory. There is a wide range of hashing methods proposed in the literature [8, 33]. They can be divided into two categories, i.e., dataindependent and datadependent.
Most methods in dataindependent category rely on random projections for generating hash functions. The representatives for this category are LocalitySensitive Hashing (LSH) [5] and its extensions which extend Euclidean distance to other distances such as kernelized LHS [15, 28], LSH with Mahalanobis distance [16].
Instead of using random projections, datadependent category uses available training data for learning hash functions in unsupervised or supervised way. The representatives for this category include unsupervised hashing such as Spectral Hashing [34], Iterative Quantization (ITQ) [6], Kmeans Hashing [9], Spherical Hashing [10], Isotropic Hashing [12] etc., and supervised hashing such as LDA Hashing [31], Minimal Loss Hashing [25, 26], ITQCCA [6], FastHash [18], Binary Reconstructive Embedding [14], etc.
One of difficult problems in hashing is to design hash function which can capture nonlinear structures in input space. Most aforementioned methods assumed hashing functions as linear functions so they may not well capture the nonlinear manifold structure of inputs. Although several kernelbased hashing methods have proposed [20, 15, 28, 7], they suffer from scalability problem.
Another difficult problem in hashing is to deal with binary constraint on codes. In general, the binary constraint imposed on the output of hash functions leads to mixedinteger optimization problem which is NPhard. To handle with this difficulty, most aforementioned methods relax the constraint during learning process. With this relaxation, the continuous codes are learned first, they then are binarized (e.g. by thresholding or with an optimal rotation). This relaxation greatly simplifies the original binary constraint problem and its solution is suboptimal, i.e., the binary codes resulting of thresholded continuous codes is not necessary same as binary codes resulting by directly solving the thresholding in the learning process.
1.1 Related work
In order to better capture nonlinear manifold structure of inputs, there are few of hashing methods [29, 4, 2] relying on deep learning techniques. Semantic hashing [29] is the first work using deep learning for hashing. Their model is formed by stacked of Restricted Boltzmann Machine and a pretraining step is required to train the model. In [2], the authors use linear autoencoder as hash functions seeking to reconstruct an input from the binary code produced by hidden layer of the network. Because the model in [2] only uses shallow network (i.e. only one hidden layer) with linear activation function, it may not well capture nonlinear structure of inputs. In [4], the authors use a deep neural network as hash functions. However, their unsupervised hashing method does not have the similarity preserving property that is not only similar inputs should likely have similar binary codes but also different inputs should likely have different binary codes. The similarity preserving property has been indicated as an important criterion for the hashing methods [34].
In order to handle with the binary constraint, semantic hashing [29] and deep hashing [4] first solve in learning process the relaxed problem by discarding the constraint and then threshold the solved continuous solution, resulting the binary solution. Opposite to [29, 4], linear binary autoencoderbased hashing [2] directly solves binary constraint during learning process. They used an exhausted search (i.e., searching in solutions) for finding the best binary code which minimizes the objective function (the reconstruction error). This may cause the training process timeconsuming when large number of bits is used to encode a sample. Recently, in supervised discrete hashing (SDH) [30], the authors proposed a new method named discrete cyclic coordinate descent which efficiently solves the binary constraint without the relaxation. By solving the binary constraint bit by bit, they achieved an analytic solution for the processed bit. This makes the training process very efficient. It is worth noting that the objective function of SDH [30] is designed by basing on the assumption that the good hash codes are optimal for linear classification. This assumption may not be directly involved to the retrieval problem.
1.2 Contribution
In this work, we first propose a novel unsupervised hashing method based on deep learning techniques. By using deep neural network with nonlinear activation functions, our method can capture complex structure in inputs. Our objective function includes the criteria [34] for producing good binary codes such as similarity preserving, independent and balancing properties. This is different from [4] where only independent and balancing properties are considered. Furthermore, instead of doing relaxation when dealing with the binary constraint as previous works [4], we directly solve the binary constraint during learning process, resulting binary codes of better quality. The main differences between our hashing method and recent deep learningbased unsupervised hashing Deep Hash (DH) [4] and linear Binary Autoencoder (BA) [2] are summarized in Table 1. The compared criteria are: is networkmodel deep? Does the objective function consider the similarity preserving/independent/balancing of binary codes? How are the binary constraint on codes solved in the learning process?
DH [4]  BA [2]  Ours  
Is model deep?  Yes  No  Yes 
Similarity preserving?  No  Yes  Yes 
Independence?  Yes^{1}^{1}1Although authors of Deep Hashing [4] considered the independent property in their objective function, they did the relaxation by putting the independent property on the weights of the network. It is different from us where the independent property is directly considered on the codes.  No  Yes 
Balance?  Yes  No  Yes 
How to solve  Relaxation  Exhausted  Closed 
binary const.?  search  form 
After introducing the new method for unsupervised hashing, we then extend our method to supervised hashing by leveraging the label information such that the binary codes preserve the semantic (label) similarity between samples. Our main contributions are summarized as follows.

We proposed a novel deep learningbased hashing method which allows to produce binary codes having expected properties such as similarity preserving, independent and balancing.

The proposed method is first evaluated in unsupervised hashing setting. After that, we extend it to supervised hashing setting by leveraging the label information.

The extensive results on three benchmark datasets show the improvement of proposed method over several stateoftheart hashing methods.
The remaining of this paper is organized as follows. Section 2 presents our proposed method for unsupervised hashing. Section 3 evaluates the proposed unsupervised hashing method. Section 4 presents our proposed method for supervised hashing. Section 5 evaluates the proposed supervised hashing. Section 6 concludes the paper.
2 Unsupervised Discrete Hashing with Deep Neural Network (UDHDNN)
2.1 Formulation of UDHDNN
Let be set of training samples; each column of corresponds to one sample. We target to learn the binary codes for each sample. Let be binary code matrix of ; is the number of desire bits to encode a sample. In our work, the hash functions are defined as a deep neural network having layers (including input and output layers).
Let be number of units in layer ; be activation function of layer ; be output values of layer (for clarifying in later sections, we use ); be weight matrix connecting layer and layer ; be bias vector for units in layer .
Our idea is to learn a deep neural network such that the sign of output values of layer can be used as binary codes and those codes should give a good reconstruction of input. To achieve this goal, we choose to optimize the following objective function
(1)  
where is a row vector having all elements equals to 1. In our formulation (1), the binary code is defined as .
The first term of the objective function (1) makes sure that the binary code gives a good reconstruction error of . It is worth noting that the reconstruction criterion does not directly measure the similarity preservation, but it has been indicated in deep learningbased hashing methods [2, 29] that the hash function defined by the neural networks containing reconstruction criterion can capture the data manifolds in a smooth way and indirectly preserve the similarity, encouraging (dis)similar inputs have to (dis)similar codes. The second term is a regularization term that tends to decreases the magnitude of the weights, and helps to prevent the overfitting^{2}^{2}2As noted by Ng [1], the regularization is not usually applied to the bias terms . Applying the regularization to the bias usually makes only a small difference to the final network.. It is worth noting in (1) that if we replace by , the objective function (1) can be seen as a deep autoencoder with linear decoder layer (i.e. the last layer uses linear activation function).
Equivalently, by introducing the auxiliary variable , the objective function (1) can be rewritten as
(2)  
s.t.
(3) 
The benefit of introducing the auxiliary variable is that we can decompose the difficult optimization problem (1) into two sub optimization problems where we can iteratively solve the optimization by alternatingly optimizing with respect to and while holding the other fixed. The idea of using auxiliary variable was also used in [2] for learning binary codes, but [2] only solves for case where hash function is linear autoencoder.
As mentioned in [34], a good binary code not only should have similarity preserving property but also should have independent and balancing properties. That is different bits are independent to each other and each bit has a chance of being or . So we add two more constraints (independence and balance) to problem (2). The new objective function is defined as
(4)  
s.t.
(5) 
(6) 
(7) 
Where is identity matrix. The problem (4) under the constraints is still NP hard and difficult to solve because of the discrete variable . One way to handle with this difficulty is by relaxing the constraint (5) as . With this approach, this binary solution is achieved by first relaxing the binary codes to a continuous space and then postprocessing, i.e. thresholding, the continuous solution. Most existing approach follow this relaxation such as Deep Hashing [4], Semantic Hashing [29], Spectral Hashing [34], AnchorGraph Hashing [21], SemiSupervised Hashing [32], LDAHash [31], etc. This relaxation simplifies the original binary constraint problem and its solution is suboptimal, i.e., the binary codes resulting of thresholded continuous codes is not necessary same as codes resulting by directly solving the thresholding process in the optimization.
In order to achieve binary codes of better quality, we should solve the binary constraint during the learning of the hash function. Inspired by the regularization methods [22], we rewrite (4) and constraints (5), (6), (7) as
(8)  
s.t.
(9) 
(10) 
(11) 
The third term in (8) is to minimize the discretization error between the continuous code and the binary code . It is shown in [22] that with sufficiently large , minimizing (8) under constraint (9) becomes close to the minimizing (4) under constraint (5). When is sufficiently large, the optimization process will result . So we can rewrite constraints (6), (7) by constraints (10), (11).
The recent work SDH [30] on supervised hashing also used idea of regularization method [22]. However, their work focused on supervised hashing; their formulation is based on the assumption that the resulted codes is good for linear classification; furthermore, they did not consider independent and balancing properties of codes. They are different from our work, focusing on unsupervised hashing, no assumption on codes, using deep neural network as hash function and considering independent and balancing properties of codes.
Instead of solving (8) under many constraints, using Lagrange multipliers approach, we solve similar following problem
(12)  
s.t.
(13) 
2.2 Optimization
2.2.1 step
When fixing , the problem becomes unconstrained optimization. We used [19, 24] optimizer with backpropagation for solving it. The gradient of objective function (12) w.r.t. different parameters are computed as follows
(14)  
(15) 
Let us define
(16)  
(17) 
where denotes Hadamard product; ,
Then, , we have
(18) 
(19) 
2.2.2 step
When fixing , we can rewrite problem (12) as
(20)  
s.t.
(21) 
Solving is challenging because of binary constraints on . Here we use recent proposed method discrete cyclic coordinate descent [30]. The advantage of this method is if we fix rows of and only solve for the remaining row, we can achieve a closedform solution for that row. It means that we can iteratively solve row by row.
Let ; . For , let be column of ; the matrix excluding ;
be column of ; be row of ; the matrix of excluding . We have closedform for as
(22) 
3 Evaluation of Unsupervised Discrete Hashing with Deep Neural Network
This section presents results of UDHDNN. We compare UDHDNN with following stateoftheart unsupervised hashing methods: Spectral Hashing (SH) [34], Iterative Quantization (ITQ) [6], Binary Autoencoder (BA) [2], Spherical Hashing (SPH) [10], Kmeans Hashing (KMH) [9]. For all compared methods, we use the codes and the suggested parameters provided by the authors.
3.1 Dataset, implementation note, and evaluation protocol
Cifar10
CIFAR10 [13] contains 60,000 color images of 10 classes. Each image has size of . The training set contains 50,000 images, and the testing set contains 10,000 images. In this experiment, we ignore the class labels. As standardly done in the literature [6, 2], we extract 320 GIST features [27] from each image.
Mnist
The MNIST [17] dataset consists of 70,000 handwritten digit images of 10 classes (labeled from 0 to 9). Each image has size of . The training set contains 60,000 samples, and the test set contains 10,000 samples. In this experiment, we ignore the class labels. Each image was represented as a 784 grayscale feature vector by using its intensity.
Sift1m
SIFT1M [11] dataset contains 128 SIFT vectors. This is standard dataset used for evaluating large scale approximate nearest neighbor search. There are 1M vectors for indexing; 100K vectors for training (separated from indexing set) and 10K vectors for testing.
Implementation note
In our deep model, we use layers (including input and output layer). The activation functions for layers and are sigmoid functions; for layers and are linear functions. The parameters , , and were empirically set as , , and , respectively. The max iteration number is set to 10.
For the CIFAR10 and MNIST datasets, the number of units in hidden layers were empirically set as , , and for the 8, 16, 32 and 64 bits respectively. For the SIFT1M dataset, the number of units in hidden layers were empirically set as , , and for the 8, 16, 32 and 64 bits respectively.
Evaluation metric
We follow standard setting widely used in unsupervised hashing [6, 10, 9, 2] using Euclidean nearest neighbors to create ground truths for queries. Number of ground truths are set as in [2]. For datasets CIFAR10 and MNIST, for each query, we use its Euclidean nearest neighbors as ground truth. For large scale dataset SIFT1M, for each query, we use its Euclidean nearest neighbors as ground truth.
We used the following evaluation metrics [6, 2] to measure the performance of methods. 1) mean average precision (mAP) which not only considers precision but also considers rank of retrieval results; 2) precision of Hamming radius (precision) which measure precision on retrieved images having Hamming distance to query (if no images satisfy, we report zero precision).
3.2 Retrieval results
3.2.1 Results on CIFAR10 dataset
Figure 1 shows retrieval results of different methods with different code lengths on CIFAR10 dataset.
In term of mAP, the proposed UDHDNN achieves the best results for all code lengths. The improvement is more clear at high . The mAP of UDHDNN consistent outperforms binary autoencoder (BA) [2] which is current stateoftheart unsupervised hashing method.
When precision of Hamming radius is used, the following observations are consistent for both and . The UDHDNN is comparable to other methods at low (i.e. ). At , UDHDNN significant outperforms other methods. When , all methods decrease the precision. The reason is that many query images have no neighbors at a Hamming distance of or less and we report zero precision for those cases. The precision of UDHDNN is lower than some compared methods at . However, we note a larger variance: the highest precision is achieved by UDHDNN at for both and cases.
Comparison with Deep Hashing (DH) [4]
We also compare our UDHDNN with the Deep Hashing (DH) [4]. Because the implementation of DH is not available, we set up our experiments similar to [4] to make a fair comparison. We randomly sample 1,000 images, 100 per class, as testing set; the remaining 59,000 images are used as training set. Each image is represented by 512 GIST descriptor [27]. The ground truths of queries are based on their class labels^{3}^{3}3It is worth noting that in the evaluation of unsupervised hashing, instead of using class label as ground truths, most stateoftheart methods [6, 10, 9, 2] use Euclidean nearest neighbors as ground truths for queries.. Similar to [4], we report comparative results in term of mAP at code lengths and the precision at Hamming radius of at code lengths . We perform the experiments 10 times and report the average performance. The comparative results are presented in the Table 2.
Method  mAP  Precision  

DH [4]  16.17  16.62  16.96  23.33  15.77 
UDHDNN  16.83  17.52  18.02  24.97  22.20 
It is clearly showed in Table 2 that the proposed UDHDNN outperforms DH [4] at all code lengths, in both mAP and precision of Hamming radius. It is because the UDHDNN contains all necessary criteria for producing good binary codes. Furthermore, instead of doing the relaxation on the binary constraint when learning the network as DH [4], we directly solve the binary constraint during the learning process.
3.2.2 Results on MNIST dataset
Figure 2 shows retrieval results of different methods with different code lengths on MNIST dataset.
The results are quite consistent with the results on the CIFAR10 dataset. The proposed UDHDNN achieves the best mAP for all code lengths. The mAP improvement is more clear at high .
When precision of Hamming radius is used, all methods achieve similar precision at low (). At , UDHDNN outperforms other methods by a fair margin. For large , i.e. , except for ITQ which slightly increase precision when , all methods decrease the precision. The precision of UDHDNN is lower than some compared methods at . However, it is worth noting that the highest precision is achieved by UDHDNN (at ).
3.2.3 Results on SIFT1M dataset
As computing mAP is slow on this large dataset, we consider top returned neighbors when computing mAP. Figure 3 shows retrieval results of different methods with different code lengths on SIFT1M dataset.
In term of mAP, the proposed UDHDNN is outperform all compared methods. It is slightly better than the current stateoftheart unsupervised hashing binary autoencoder (BA) [2].
In term of precision of Hamming radius, the results of UDHDNN are consistent to its results on CIFAR10 and MNIST. All methods achieve similar precision at low (). At , precision of UDHDNN is lower than some methods. However, the highest precision is achieved by UDHDNN at and it is much better than the competitors.
4 Supervised Discrete Hashing with Deep Neural Network (SDHDNN)
There are several approaches proposed to leverage the label information when learning binary codes in the supervised hashing. In [31, 23], binary codes are learned such that they minimize Hamming distance between samples belonging to same class, while maximizing the Hamming distance between samples belonging to different classes. In [30], the binary codes are learned such that they minimize the loss w.r.t. the ground truth labels.
In this work, we adapt the approach proposed in kernelbased supervised hashing (KSH) [20] to leverage the label information. The main idea is to learn binary codes such that the Hamming distance between binary codes of samples are high correlated with the precomputed pairwise label matrix. In the other words, the binary codes should preserve the semantic (label) similarity between samples. It worth noting that in KSH [20] the hash functions are linear and are defined in kernel space of inputs. The independent, balancing criteria are not considered in KSH [20].
In general, the network structure of SDHDNN is similar to the proposed UDHDNN, excepting that the last layer preserving reconstruction is removed. The layer in UDHDNN will become the last layer in SDHDNN. The semantic preservation property in SDHDNN is constrained on output of its last layer.
4.1 Formulation of SDHDNN
Following KSH [20], we fist define the pairwise label matrix as
(23) 
The goal of learning process is to learn hash function which generating discriminative codes such that similar pairs can be perfectly distinguished from dissimilar pair by using Hamming distance in the code space. In the other words, the Hamming distance between learned binary codes should correlate with the matrix . Formally, the binary codes should satisfy
(24) 
Using the idea of regularization as the unsupervised hashing (Sec. 2), we integrate the above criterion to our model by solving the following constrained optimization
s.t.
(26) 
The main difference in formulation between the proposed UDHDNN (12) and the proposed SDHDNN (LABEL:eq:obj_sup2) is that the reconstruction term which indirectly preserves the neighbor similarity in UDHDNN (12) is replaced by the term preserving the semantic (label) similarity in SDHDNN (LABEL:eq:obj_sup2).
4.2 Optimization
To solve (LABEL:eq:obj_sup2) under constraint (26), we alternating optimize over and .
4.2.1 step
When fixing , (LABEL:eq:obj_sup2) becomes unconstrained optimization. We used [19] optimizer with backpropagation for solving it. The gradient of objective function w.r.t. different parameters are computed as follows.
Let
(27)  
where .
Let
(28) 
where denotes Hadamard product; , .
, we have
(29) 
(30) 
4.2.2 step
5 Evaluation of Supervised Discrete Hashing with Deep Neural Network
This section evaluates the proposed SDHDNN method. The proposed SDHDNN is compared against several stateoftheart supervised hashing methods including Supervised Discrete Hashing (SDH) [30], ITQCCA [6], KSH [20], BRE [14]. For all compared methods, we use the codes and the suggested parameters provided by the authors.
5.1 Dataset, Implementation note and Evaluation protocol
Dataset
We evaluate the proposed methods on two widely used datasets: CIFAR10 and MNIST. The description of these dataset is provided in section 3.1.
Implementation note
The network configuration is same as UDHDNN excepting the final layer is removed. The values of parameters , , and are empirically set as , , and , respectively. The max iteration number is set to 5.
For ITQ_CCA [6] and SDH [30], all training samples are used for training. For SDHDNN, KSH [20], BRE [14] which label information is leveraged by pairwise label matrix , we randomly select training samples from each class and use these selected samples as new training set. The pairwise label matrix in SDHDNN is immediately obtained by using (23) because the exact labels are available.
Evaluation protocal
Follow standard setting for evaluating supervised hashing methods [30, 6], we report the retrieval results in two metrics 1) mean average precision (mAP) and 2) precision of Hamming radius (precision) which measure precision on retrieved images having Hamming distance to query (if no images satisfy, we report zero precision). As standardly done in the literature [30, 6], the ground truths are defined by the class labels from the datasets.
5.2 Retrieval results
5.2.1 Results on CIFAR10
Figure 4 shows comparative results on CIFAR10 dataset. In term of mAP, we can clearly see that the proposed SDHDNN outperforms all compared methods by a fair margin on all code lengths. The improvement of SDHDNN over the current stateoftheart supervised hashing SDH [30] is +17%, +3.1%, +4.9% and +3.4% at 8, 16, 32 and 64 bits, respectively. The improvements of SDHDNN over KSH [20] which also uses pairwise label matrix are +7.6%, +6.2%, +5.9% and +5.3% at 8, 16, 32 and 64 bits, respectively.
In term of precision of Hamming radius, the proposed SDHDNN clearly outperforms the compared methods at low code lengths, i.e., . SDH [30] becomes comparable with SDHDNN when increasing the code lengths, i.e., .
5.2.2 Results on MNIST
Figure 5 shows comparative results on MNIST dataset. In term of mAP, the proposed SDHDNN outperforms the current stateoftheart SDH +13.9% at bits. When increases, SDHDNN and SDH [30] achieve similar performance. In comparison with KSH [20], SDHDNN significantly outperforms KSH at all code lengths; the improvements are +3%, +4.9%, +3% and +3.2% at 8, 16, 32 and 64 bits, respectively.
6 Conclusion
In this paper, we propose two novel hashing methods that are UDHDNN for unsupervised hashing and SDHDNN for supervised hashing for learning compact binary codes. Our methods include all necessary criteria for producing good binary codes such as similarity preserving, independent and balancing. Another advantage of proposed methods are that the binary constraint on codes are directly solved during optimization without any relaxation. The experimental results on three benchmark datasets show the proposed methods compare favorably with stateoftheart hashing methods.
References
 [1] N. Andrew. MultiLayer Neural Network. http://ufldl.stanford.edu/tutorial/supervised/MultiLayerNeuralNetworks/.
 [2] M. A. CarreiraPerpinan and R. Raziperchikolaei. Hashing with binary autoencoders. In CVPR, 2015.
 [3] R. Datta, D. Joshi, J. Li, and J. Z. Wang. Image retrieval: Ideas, influences, and trends of the new age. ACM Comput. Surv., 2008.
 [4] V. Erin Liong, J. Lu, G. Wang, P. Moulin, and J. Zhou. Deep hashing for compact binary codes learning. In CVPR, 2015.
 [5] A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. In VLDB, 1999.
 [6] Y. Gong and S. Lazebnik. Iterative quantization: A procrustean approach to learning binary codes. In CVPR, 2011.
 [7] Y. Gong, S. Lazebnik, A. Gordo, and F. Perronnin. Iterative quantization: A procrustean approach to learning binary codes for largescale image retrieval. PAMI, pages 2916–2929, 2013.
 [8] K. Grauman and R. Fergus. Learning binary hash codes for largescale image search. Machine Learning for Computer Vision, 2013.
 [9] K. He, F. Wen, and J. Sun. Kmeans hashing: An affinitypreserving quantization method for learning binary compact codes. In CVPR, 2013.
 [10] J.P. Heo, Y. Lee, J. He, S.F. Chang, and S.e. Yoon. Spherical hashing. In CVPR, 2012.
 [11] H. Jégou, M. Douze, and C. Schmid. Product quantization for nearest neighbor search. PAMI, pages 117–128, 2011.
 [12] W. Kong and W.J. Li. Isotropic hashing. In NIPS, 2012.
 [13] A. Krizhevsky. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009.
 [14] B. Kulis and T. Darrell. Learning to hash with binary reconstructive embeddings. In NIPS, 2009.
 [15] B. Kulis and K. Grauman. Kernelized localitysensitive hashing for scalable image search. In ICCV, 2009.
 [16] B. Kulis, P. Jain, and K. Grauman. Fast similarity search for learned metrics. PAMI, pages 2143–2157, 2009.
 [17] Y. Lecun and C. Cortes. The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist/.
 [18] G. Lin, C. Shen, Q. Shi, A. van den Hengel, and D. Suter. Fast supervised hashing with decision trees for highdimensional data. In CVPR, 2014.
 [19] D. C. Liu and J. Nocedal. On the limited memory bfgs method for large scale optimization. Mathematical Programming, 45:503–528, 1989.
 [20] W. Liu, J. Wang, R. Ji, Y.G. Jiang, and S.F. Chang. Supervised hashing with kernels. In CVPR, 2012.
 [21] W. Liu, J. Wang, S. Kumar, and S. Chang. Hashing with graphs. In ICML, 2011.
 [22] J. Malick, J. Povh, F. Rendl, and A. Wiegele. Regularization Methods for Semidefinite Programming. SIAM Journal on Optimization, pages 336–356, 2009.
 [23] V. A. Nguyen, J. Lu, and M. N. Do. Supervised discriminative hashing for compact binary codes. In ACM MM, 2014.
 [24] J. Nocedal. Updating QuasiNewton Matrices with Limited Storage. Mathematics of Computation, pages 773–782, 1980.
 [25] M. Norouzi and D. J. Fleet. Minimal loss hashing for compact binary codes. In ICML, 2011.
 [26] M. Norouzi, D. J. Fleet, and R. Salakhutdinov. Hamming distance metric learning. In NIPS, 2012.
 [27] A. Oliva and A. Torralba. Modeling the shape of the scene: A holistic representation of the spatial envelope. IJCV, pages 145–175, 2001.
 [28] M. Raginsky and S. Lazebnik. Localitysensitive binary codes from shiftinvariant kernels,â advances in neural information processing systems, 2009.
 [29] R. Salakhutdinov and G. E. Hinton. Semantic hashing. Int. J. Approx. Reasoning, pages 969–978, 2009.
 [30] F. Shen, C. Shen, W. Liu, and H. Tao Shen. Supervised discrete hashing. In CVPR, June 2015.
 [31] C. Strecha, A. M. Bronstein, M. M. Bronstein, and P. Fua. Ldahash: Improved matching with smaller descriptors. PAMI, pages 66–78, 2012.
 [32] J. Wang, S. Kumar, and S. Chang. Semisupervised hashing for largescale search. PAMI, pages 2393–2406, 2012.
 [33] J. Wang, H. T. Shen, J. Song, and J. Ji. Hashing for similarity search: A survey. CoRR, 2014.
 [34] Y. Weiss, A. Torralba, and R. Fergus. Spectral hashing. In NIPS, 2008.