Global Hashing System for Fast Image Search
Abstract
Hashing methods have been widely investigated for fast approximate nearest neighbor searching in large datasets. Most existing methods use binary vectors in lower dimensional spaces to represent data points, which are usually real vectors of higher dimensionality. However, according to Shannon’s Source Coding Theorem (SSCT) in information theory, it is logical to represent lowdimensional real vectors with highdimensional binary vectors, since a binary bit contains less information than a real number. We design a novel hashing method based on this principle. Data points are first embedded in a lowdimensional space, and then the Global Positioning System (GPS) method is introduced but modified for hashing. We devise dataindependent and datadependent methods to distribute the “satellites” at appropriate locations. Benefitting from the rationale of SSCT and rules on distributing satellites in a GPS, our datadependent method outperforms other methods in differentsized datasets from 100K to 10M. By incorporating the orthogonality of the code matrix, both our dataindependent and datadependent methods are particularly impressive in experiments on longer bits.
I Introduction
Hashing methods are efficient for approximate nearest neighbor (ANN) searching, which is important in computer vision [7][42][47][40] and machine learning [24][29][36][43]. Hashing methods map original input data points to binary hash codes while preserving their mutual distances; that is, the binary strings of similar data points in the original feature space should have low Hamming distances. Hashing with short codes can substantially reduce storage requirements and boost the ANN searching speed.
Popular hashing methods can be categorized into two groups according to their dependence on data. The most wellknown dataindependent hashing methods are LocalitySensitive Hashing (LSH) [2] and its variances, e.g., those adopting cosine similarity [6] and kernel similarity [23]. The main drawback of these methods is the demand of more bits per hashing table, due to randomized hashing [35].
Datadependent methods have become popular in the machine learning community. Spectral Hashing (SH) [44], one of the most popular datadependent methods, generate hashing codes by solving the relaxed mathematical problem to circumvent the computation of pairwise distances in the whole dataset, i.e, the affinity matrix and the constraints that lead a NPhard problem. Anchor Graph Hashing (AGH) [28] optimizes the object function of SH by using anchor points to construct a highly sparse affinity matrix. Discrete Graph Hashing (DGH) [27] follows this idea and incorporates the orthogonality of hashing code matrix. There are also methods based on linear projections of Principal Component Analysis (PCA) [46][20][19] or Linear Discriminant Analysis [37] and those hashing in kernel space, such as binary reconstructive embeddings (BRE) [22], random maximum margin hashing (RMMH) [18] and kernelbased supervised hashing (KSH) [42]. Unlike the ITQ that rotates the projection matrix obtained by PCA to minimize the loss function, the Neighborhood Discriminant Hashing (NDH) [39] incorporate the computation of the projection matrix during the minimization procedure. In general, the linear dimensionality reduction techniques, such as PCA, is inferior to nonlinear manifold learning methods which are able to more effectively preserve the local structure of the input data without assuming global linearity [38]. However, the nonlinear manifold techniques may be intractable for large datasets because of their high computation costs. To address this problem, Inductive Manifold Hashing (IMH) [35][33] learns the nonlinear manifold on a small subset and inductively insert the remainder of data. Besides, hashing methods focus on the image representations have been developed recently. For example, RZhang et al. [49] unifies the feature extraction and the hashing function learning. Zhang et al. [48] and Liu et al [26] develop their methods on multiple representations.
However, the main theoretical deficit in the datadependent methods is that they fail to conform to Shannon’s Source Coding Theorem (SSCT) [11]. In practice, an image in the dataset is usually represented by a descriptor, e.g., SIFT [30] or GIST [32] descriptor with more than 128dimensional 8bit characters or 32bit single real numbers in a computer. In information theory [11], entropy is the average amount of information contained in a message, which, in this context, refers to a descriptor vector or binary code vector. According to SSCT, the code length should be no less than the Shannon entropy of original data points. Without ambiguity in this paper, entropy refers to Shannon entropy. The entropy is defined as , where is a random variable and is the probability of . For instance, by assuming uniform distribution, the entropy of a 64dimensional 8bit character vector is 512, which means 512bit binary strings are needed.
Exploiting this principle, we first reduce the dimensionality of the original data points, i.e., the descriptor vectors, by PCA. Then, the projections on the first principle components are encoded by dimensional binary code, where . Hence, we need an overdetermined system that can uniquely position every data point. This is similar to Global Positioning Systems (GPS) [13], which use dozens of satellites to position a receiver on the Earth surface. Since our method is directly inspired by GPS, we name it the Global Hashing System (GHS). We tackle the major issue of how to distribute satellites and propose two methods: one datadependent method and one dataindependent method. Unlike most existing methods [46][44][19] that handle the degraded version of orthogonality of code matrix in continuous domain, both our methods approximate the orthogonal code matrix directly in binary domain, which leads better performance on longbit experiments. Note that although SH can be regarded as assigning more bits to PCA directions along which the data have greater ranges, it is somewhat heuristic [46].
After the satellites are well distributed, the distances from data points to each satellite (to simplify following discussion, this distance is denoted as D2S hereafter) are sorted separately. The nearest half is denoted as 1 while the other half is denoted as 1. Hence, our method can generate balanced code matrix easily. Although a balanced code matrix is considered to be one of the two conditions for good codes [44], it is rarely considered because it usually results in a NPhard problem.
Ii Methodology
Let us define the used notations. A set of data points in a dimensional space is represented by , which form the rows of data matrix . is obtained by the first eigenvectors of the data covariance matrix . and is the th row vector of . A binary code corresponding to is defined by , where is the length of the code and the code matrix .
Iia Global Positioning/Coding System
A satellite in a GPS has the ability to measure the distance between itself and a signal receiver on Earth surface. This results in a circle on which every point has the same distance to this satellite as the receiver. Hence, at least three satellites are needed to determine the true position which is the unique intersection of three such circles. More generally, a dimensional point can be determined by its Euclidean distances to other points in this space [1].
In our GHS, each satellite only has 1bit to record the Euclidean distances. That is, the receivers far from a satellite are denoted as 1 while the nearby ones are denoted as 1. Hence, our hashing function can be defined as:
(1) 
where computes the Frobenius norm of each row of and can be any proper functions that return a positive real number. Here is adopted to generate a balanced code matrix. is the coordinate of the th satellite and it forms up the th row of satellite matrix .
IiB Datadependent method (GHSDD)
Formally, our hashing model can be described as:
(2) 
Randomly setting does not produce satisfactory results. Furthermore, Eq. (2) requires pairwise distance between each pair of data points, which leads heavy burden in storage and computation. Inspired by ITQ, we circumvent it by minimizing the quantization loss.
At first, let us consider following quantization loss:
(3) 
Because is always nonnegative, we scale and shift B to . The underlying reasonability of Eq. (3) is similar to ITQ. To uniquely position a data point in dimensional space, at least satellites are required and the locations of these satellites should satisfy following condition [1]:
(4) 
where and . Eq. (4) is called the existence and uniqueness condition for GPS solution [1]. It can be satisfied by initializing an orthogonal . Hence, we create groups of satellites. Within each group, there are satellites, of which are orthogonal to each other. We define , a parameter discussed in Section IID. Note that no more than mutual orthogonal vectors in a dimensional space. Each group is rotated by an orthogonal matrix to find the best location, which gives the following model:
(5) 
where is an indicator function. , if and , if . and are used to transform the values of D2S into a proper interval. Eq. (5) is minimized by iterative minimization.
Initialization. In each group, is initialized by the left singular vectors of a random matrix, so does . Another random vector is added into each group.
Update . The th column of is calculated by Eq. (1).
Update . Take the partial derivative with respect to , resulting
(6) 
Update . Similar to ,
(7) 
Please note when we deduce Eq. (7), is applied.
Update . We divide this step to two subproblems. First, is substituted by to form up following minimization problem:
(8) 
which is equivalent to
(9) 
where . If we treat as a receiver, as satellites and as the D2S, the solution of Eq. (9) is the standard solution of GPS [3].
We construct following two matrices for each : and , where represents the th column of and returns a row vector which contains the diagonal elements of . Let . Then solve following quadratic equation about :
(10)  
Eq. (10) usually have two solutions and , therefore two possible can be found by , where and which is useless in our model is related to D2S. To automatically choose a suitable from two solutions, we initialize with , where is a positive real constant. The whose norm is closer to is chosen for following steps. is also used in our dataindependent satellite distribution algorithm and discussed in Section IID along with parameter .
After s are calculated, is found by minimizing following problem:
(11) 
Eq. (11) can be solved by singular value decomposition (SVD). Given and which contain and of Group , respectively, through SVD, we can get and
Convergence. When or maximum iteration is reached, the algorithm is terminated, where is a small positive real constant.
Output. and thresholds, i.e., in Eq. (1).
OutofSample Hashing. A new query is projected by and then its distance to each satellite is cut off by .
8  12  16  24  32  64  96  
GHSDD  0.1890  0.2232  0.2392  0.2761  0.3053  0.3816  0.4131  
0.1884  0.2214  0.2412  0.2806  0.3089  0.3972  0.4324  
0.32%  0.81%  0.83%  1.60%  1.17%  3.93%  4.46%  
GHSDI  0.1543  0.1838  0.2079  0.2581  0.2757  0.3474  0.4018  
0.1537  0.1861  0.2098  0.2688  0.3008  0.3653  0.4144  
0.39%  1.24%  0.91%  3.98%  8.34%  4.90%  3.04%  

IiC Dataindependent method (GHSDI)
Another condition for good code is uncorrelation [23], i.e., . A direct way to satisfy this condition is distributing the satellites such that only one is close to each receiver; that is, there is no intersection among all spheres, where is the minimum radius that include the nearby data points of . However, in this situation, each receiver only has 1bit 1. The hamming distance between any pair of receivers is 0 or 2, which means the distance between two data points in input space is not well preserved. What’s more, if we strictly satisfy the balance condition as well as uncorrelation condition in this way, at most 2 satellites can be used.
An alternative way is minimizing the intersections of sphere and sphere for any . That is, we put a tolerance for the values of nondiagonal elements of . They are allowed to be nonzero numbers with small absolute values.
The intersection of two dimensional sphere is too difficult to compute, therefore the pairwise distance between each pair of satellites is maximized. Without constraints, the resulting may be . A reasonable constraint is distributing all satellites on the surface of sphere. As there is no prior knowledge about the data, we assume data points are uniformly distributed in a sphere. By , the D2S of each satellite will be comparable.
Under the abovementioned assumption, minimizing intersections can be achieved by maximizing the pairwise distance between each pair of satellites:
(12) 
Eq. (12) can be maximized by Gradient Projection Algorithm (GPA) [9]. The GPA iteratively updates by moving along the gradient direction of and projects to the boundary defined by the constraint (Algorithm 1). The gradient of with respect to is
(13) 
The projection step can be directly implemented by normalizing each . As the orthogonality of is considered, our GHSDI method usually produces the second best results on experiments of longer hash bits. Actually the way that GHSDD satisfies Eq. (4) intrinsically incorporates orthogonality. When , the hypersphere surface that separates the near and far data points can be treated as a hyperplane. In this situation, with orthogonal and assumption of uniform distribution of data points, this property is easy to understand in and cases. More generally, we have following theorem.
Theorem 1.
If (1) data points are uniformly distributed in a sphere, (2) and (3) , then , where and are column vectors whose elements are the binary hash codes generated by Eq. (1).
Proof.
Since the data points are uniformly distributed in a sphere, without losing generality, let us set and . In Eq. (1), if , the th element of will be set to , otherwise it will be set to . For any two points and that satisfy , we have , when . That is, which implies , where is the angle between two unit vectors along and , respectively. Hence, and locate on a plane whose distance to is .
To generate a balanced , should cross the origin and perpendicular to . Since , is also perpendicular to which corresponds to . It is evident that and separate the sphere into four parts with equal volume:
(14) 
Since there are equal number of data points in these four parts, it is easy to verify that . ∎
In Theorem 1, condition (1) and (2) are impractical and therefore only the second sufficient condition can be satisfied by setting ; however, this contravenes the perspective of SSCT and the existence and uniqueness condition for GPS solution. In Section IID, we will show usually cannot generate the best results. Although our methods cannot exactly fulfill these three conditions, its superiority of considering the orthogonality was proven by its high Fmeasure in experiments on longer bits (Section IV).
IiD Parameters and
There are two key parameters in our methods  and . should not be too small. Consider an extreme example that , then all bits of the points close to the origin will equal to 0 and bits of other points will equal to 1. Obviously, such codes are inefficient.
should be moderate. If is too large, the binary codes will gradually lose their ability to encode the values of projections which are real numbers. On the other hand, when becomes small, fewer projections can be used, so the data points reconstructed by these projections cannot approximate the original ones accurately enough.
The mean average precision (MAP) on CIFAR10 dataset [21] with varying and is shown in Fig. 1. CIFAR10 comprises of 60K images from the 80 Million Tiny Image dataset [40] and we use 1024dimensional GIST descriptor to represent each image. Their PCA projections are normalized by the largest Euclidean norm of all projected data. When testing on different s , at most one group containing less than satellites may exist. Based on the results in Fig. 1, we empirically set as 2 for all experiments and set as 1 for experiments whose , while 0.5 for others.
We also tested our two methods by setting (Table I). The percentages shown in Table I denote the improvement by setting . Referring to Table I, we observe that for , both methods perform better with , suggesting that the existence and uniqueness condition for GPS solution is important. For experiment on , the situation is opposite, because the number of PCA projections are too small and its effect dominates results. However, the differences are slight in these cases (less than 1%), so we did not use parameter setting in experiments of Section 4.
Iii Relations to Existing Methods
During past several years, many stateoftheart datadependent hashing methods have been proposed. These methods derive from various motivations. In this section, only those related to our proposed methods are briefly reviewed.
Iiia Iterative Quantization (ITQ)
Gong et al. [46] formulated ITQ as a minimization problem:
(15) 
Eq. (18) is minimized by iteratively updating and . is required to be orthogonal, which can be considered as a rotation to . IsoH [20] is directly derived from ITQ by finding a projection with equal variances for different dimensions. HH [45] rotates ; however, unlike ITQ, it uses an auxiliary variable for the code matrix during the iterative optimization and puts an orthogonal constraint on it. Then, the auxiliary variable is thresholded to generate code matrix. okmeans [31] rotates and scales to minimize the quantization loss. Our method rotates and scales the D2S. ITQ, IsoH and HH use principle components whose number is exactly equal to the bit length of hash codes. That is, they cannot be used to produce hash code that is longer than the data dimension. Theoretically, our methods can produce arbitrary length of hash codes.
IiiB Inductive Hashing on Manifolds (IMH)
IMH [35] first generates the Base matrix by Kmeans clustering. Each column corresponds to a cluster center. Then it embeds into lowdimensional space by manifold learning methods [41][12]. The embedding methods affect the performance of IMH. Throughout this paper, tSNE [41] is used because it achieved the best results in the authors’ experiments [35]. Finally, the embedding for the training data is calculated by
(16) 
where the elements in is defined as
(17) 
where is the th column of . Eq. (17) is quite similar to membership in fuzzy cmeans clustering [4]. The embedding for the training data is linear combination of embedding for . In our method, each satellite encodes 1bit according to the distances from itself to the data points and we don’t encode the satellites.
IiiC Spectral Hashing (SH)
Weiss et al. [44] formulated the SH as:
(18) 
Eq. (2) is similar to Eq. (18). The graph affinity matrix with is intractable for large datasets. SH evaluates smallest eigenvalues for each PCA direction to create a list of eigenvalues, sorts this list to find the smallest eigenvalues and then thresholds the corresponding eigenfunctions. The eigenvalue list creation step is consistent with the perspective of SSCT, however it is somewhat heuristic [46]. AGH and DGH compute D2S to form up a highly sparse affinity matrix to minimize the modified object function of SH. GHSDD avoids the computation and storage of pairwise distances of all data points by minimizing the quantization loss. Furthermore, our method generates a balanced code matrix but they cannot.
IiiD Spherical Hashing (SpH)
The final step of SpH [15] is the same as our method, so SpH also generates a balanced code matrix. However, SpH searches the locations of special points in the entire space, which makes it difficult to find a good solution. The authors claimed that the distances between these points should be neither too large nor too small, and hence an empirical pointfinding procedure was devised that has less theoretical support. With more concrete theoretical analysis, our proposed method appears to outperform SpH.
SUN397  

8  12  16  24  32  64  96  128  
GCSDI  0.1336  0.1744  0.2194  0.2290  0.2579  0.3167  0.3588  0.3860 
GCSDD  0.1533  0.1945  0.2447  0.2746  0.2998  0.3492  0.3880  0.4096 
ITQ  0.1508  0.1859  0.2301  0.2619  0.2886  0.3317  0.3592  0.3750 
IsoH  0.1420  0.1677  0.1881  0.1950  0.2278  0.2578  0.2873  0.2882 
HH  0.1478  0.1866  0.2213  0.2554  0.2687  0.3253  0.3543  0.3739 
SH  0.1219  0.1369  0.1475  0.1705  0.1758  0.1897  0.2180  0.2206 
IMH  0.1296  0.1357  0.1533  0.2453  0.2689  0.2896  0.3077  0.3990 
okmeans  0.1469  0.1852  0.2136  0.2524  0.2716  0.3248  0.3507  0.3658 
SpH  0.0377  0.0359  0.0364  0.0365  0.0363  0.0599  0.0942  0.2578 
GIST1M  

8  12  16  24  32  64  96  128  
GCSDI  0.1245  0.1552  0.1802  0.2052  0.2191  0.2596  0.2790  0.2885 
GCSDD  0.1358  0.1682  0.1952  0.2211  0.2438  0.2694  0.2854  0.2967 
ITQ  0.1260  0.1593  0.1851  0.2098  0.2269  0.2577  0.2703  0.2775 
IsoH  0.1121  0.1310  0.1844  0.1939  0.2288  0.2579  0.2712  0.2854 
HH  0.1207  0.1603  0.1780  0.2019  0.2247  0.2597  0.2745  0.2880 
SH  0.0871  0.0986  0.1033  0.1208  0.1339  0.1682  0.1781  0.1781 
IMH  0.1248  0.1449  0.1748  0.1849  0.1965  0.2161  0.2385  0.2638 
okmeans  0.1239  0.1610  0.1778  0.2070  0.2201  0.2565  0.2741  0.2809 
SpH  0.0369  0.0349  0.0348  0.0359  0.0356  0.0637  0.0788  0.1919 
SIFT10M  

8  12  16  24  32  64  96  128  
GCSDI  0.1738  0.2193  0.2674  0.3342  0.3837  0.5156  0.5569  0.5797 
GCSDD  0.1864  0.2339  0.2769  0.3535  0.4098  0.5277  0.5692  0.5889 
ITQ  0.1666  0.2195  0.2655  0.3452  0.3906  0.5025  0.5522  0.5782 
IsoH  0.1764  0.2224  0.2469  0.3326  0.3766  0.4653  0.5524  0.5695 
HH  0.1701  0.2258  0.2516  0.3143  0.3524  0.4494  0.5163  0.5554 
SH  0.1704  0.2170  0.2382  0.2708  0.2810  0.3148  0.3039  0.3157 
IMH  0.1833  0.1888  0.2007  0.2254  0.2884  0.3052  0.3358  0.3634 
okmeans  0.1814  0.2260  0.2699  0.3233  0.3605  0.4401  0.4538  0.4964 
SpH  0.0440  0.0487  0.0400  0.0475  0.0381  0.0615  0.1721  0.1947 
SUN397  GIST1M  SIFT10M  
Train  Test  Train  Test  Train  Test  
GHSDI  
GHSDD  
ITQ  
IsoH  
HH  
SH  
IMH  
okmeans  
SpH 
Iv Experiments
Our experiments were conducted on three datasets of three different scales: SUN397 [17], GIST1M [16] and SIFT10M. SUN397 contains about 108K images and we represent each image by a 512dimensional GIST descriptor [32]. GIST1M consists of 1 million 960dimensional GIST descriptors. SIFT10M is a 10 million subset of SIFT1B [16] dataset which comprises of 1 billion 128dimensional SIFT descriptors [30]. The 10 million data points are randomly chosen. 1K images are randomly selected from the whole SUN397 to form a separate test dataset. For GIST1M, there is a 1K test dataset available. For SIFT10M, we randomly selected 1K data points from its 10K test dataset. Groundtruth neighbors for a given query are defined as the samples in the top of 2% Euclidean distance.
Iva Protocols and Baselines
We evaluate our methods by comparing to seven hashing methods which includes: Iterative Quantization (ITQ) [46], Isotropic Hashing (IsoH) [20], Harmonious Hashing (HH) [45], Spectral Hashing (SH) [44], Inductive Manifold Hashing (IMH) [35], Orthogonal Kmeans (okmeans) [31] and Spherical Hashing (SpH) [15]. Our datadependent and dataindependent are denoted as GHSDD and GHSDI, respectively. We use publicly available codes of comparing methods and follow the suggesting parameter settings by corresponding publications. All data are zerocentered and in our methods, their PCA projections are normalized by the largest Euclidean norm of all projected data in our methods. Two kinds of experiments  Hamming ranking and hash lookup were conducted. The performance of Hamming ranking is measured by MAP and F1 score which is denoted as Fmeasure is used for evaluating the performance of hash lookup, where F1 score is defined as . Ground truths are defined by Euclidean neighbors.
IvB Quantitative Evaluation
The mean average precision (MAP) values are given in Table IIIV. It can be seen that GHSDD outperforms all compared methods. The performance of GHSDI is poorer than ITQ, HH and SH except of 128bit experiments. Benefitting from the reasonability on information theory and balanced code matrix, GHSDD exceeds ITQ, IsoH and HH. Due to the limitation on computation, SpH works on a small subset of the whole dataset and its empirical satellite distribution algorithm is demonstrated to be less efficient than ours. The Fmeasure is illustrated in Fig. 2. Again, GHSDD exceeds others. It is worth noticing that GHSDI generated the second best MAP and Fmeasure in experiments on longer bits (), because GHSDI considers orthogonality of the code matrix. The way that GHSDD satisfies the condition of uniqueness and existence of GPS solution, i.e., Eq. (4) and its datadependent property makes it work better than GHSDI.
IvC Computational Efficiency
Training and testing time on 32bit are given in Table V. All experiments were done on MATLAB R2013b installed on a PC with 2.85 GHz CPU and 128 GB RAM. The major computation cost of GHSDI is the calculation of D2S at the final step, which is linearly related to the product of data dimension and size of dataset. Hence, it takes the least time on GIST1M and SIFT10M. Because GHSDD computes D2S in every iteration, its computation cost is moderate. When testing a new query, GHSDI and GHSDD computes D2S and hence their computation costs are approximate. Although the testing procedure of SpH is similar to ours, it computes D2S in original input data space whose dimension is , so its testing time is longer.
IvD Incorporating Label Information
To incorporate label information, a supervised dimensionality reduction
method can be used to better capture the semantic structure of the
dataset. Among various supervised dimensionality reduction methods,
Canonical Correlation Analysis (CCA) [14] has proven to be efficient
for extracting a common latent space from two views [10] and robust
to noise [5].
Let be a label vector, where is the total number of labels. If the th image is associated with the corresponding label, and otherwise. is the matrix whose rows are comprised of label vectors. The goal of CCA is to maximize the correlation between projected data matrix and label matrix by finding two projection directions and . The correlation is defined as:
(19)  
can be got by solving the following generalized eigenvalue problem:
(20) 
where is a small regularization constant and is set to be
0.0001 here. Just as in the case of PCA, the leading generalized eigenvectors
scaled their corresponding eigenvalues form up the rows of
projection matrix
and we obtain the embeded data matrix .
Finally, both of our dataindependent and datadependent methods can
be used to generate hashing codes.
CIFAR10 dataset is used in this experiment. The 60K images in CIFAR10
are labelled as 10 classes with 6,000 samples for each class. Again,
each image is represented by a 1024 dimensional GIST feature. 1,000
samples are randomly chosen as queries and the remaining samples are
used for training. Our proposed supervised hashing methods are denoted as CCAGHSDI and CCAGHSDD, respectively. The baseline methods are Supervised Discrete Hashing (SDH) [34], KSH [42], FastHash [25] and CCAITQ [46].
The mean Fmeasure of hash lookup Hamming distance 2 and MAP scores of the compared methods are given in Fig. 3. CCAGHSDD achieves the best Fmeasures and MAPs for all code lengths, while CCAGHSDI is only a little inferior to SDH for 16bit code length. In the hash lookup experiments, we found that setting Hamming distance as 2 is favorable for both of our proposed methods, because two groups of satellites were used for experiments of . In Fig. 4, 5 queries with their corresponding results retrieved by compared methods using 16bit hashing code are illustrated to qualitatively evaluate the performance. It can be seen that both CCAGHSDI and CCAGHSDD outperform the compared methods.
IvE Classification with hashing codes
In this subsection, the MNIST dateset is used for evaluate the performance of the learned hashing codes by compared methods. The MNIST dataset consists of 70, 000 images, each of which is 784dimensional. These images are handwritten digits from ‘0’ to ‘9’. BRE, CCAITA, KSH, FastHash and SDH are used as baselines.
Linear Support Vector Machine (SVM) is applied on the hashing codes. The LIBLINEAR [8] solver is used to train the SVM. The classification results are given in Fig. 5. From Fig. 5, it can be seen that both CCAGHSDD gets the highest classification accuracy over all hash bit length, while CCAGHSDI is the second best when but trail SDH in experiments on 32bit hash codes.
V Conclusion
We have proposed a novel hashing method based on and Shannon’s Source Coding Theorem witch requires that the hashing codes should be longer than the embedding for original training data. To circumvent computation of pairwise distances between each pair of data points, we minimize the new formulation of quantization loss which is based on Global Positioning System (GPS). Datadependent and dataindependent methods are proposed to distribute the satellites. According to the experimental results on three scales of datasets, the datadependent method (GHSDD) was superior to other methods, and the dataindependent method (GHSDI) produced promising results in less training time. However, GHSDD took a moderate length of time to train, and the demand on RAM was limited by the computation of the covariance matrix in PCA. By incorporating Canonical Correlation Analysis (CCA), the proposed methods can be used for supervised hashing. The performance of CCAGHSDI and CCAGHSDD are superior. Finally, the retained hashing codes are used for classification problem to further demonstrate the outstanding performance of the proposed methods. Future work will focus on improving the computational efficiency and investigating methods to train the model using a few samples from the whole dataset to handle larger datasets such as SIFT1B and Tiny 80M.
References
 [1] (1991Nov.) Existence and uniqueness of GPS solutions. IEEE Transactions on Aerospace and Electronic System 27 (6), pp. 952–956. Cited by: §IIA, §IIB.
 [2] (2008Jan.) Nearoptimal hashing algorithms for approximate nearest neighbor in high dimensions. Communications of the ACM 51 (1), pp. 117–122. Cited by: §I.
 [3] (1985Jan.) An algebraic solution of the GPS equations. IEEE Transactions on Aerospace and Electronic System 21, pp. 56–59. Cited by: §IIB.
 [4] (1981) Pattern recognition with fuzzy objective function algorithms. Kluwer Academic Publishers. External Links: ISBN 0306406713 Cited by: §IIIB.
 [5] (2008) Correlational spectral clustering. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. Cited by: §IVD.
 [6] (2002) Similarity estimation techniques from rounding algorithms. In ACM Symposium on Theory of Computing, pp. 380–388. Cited by: §I.
 [7] (2013) Fast, accurate detection of 100,000 object classes on a single machine. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 1814–1821. Cited by: §I.
 [8] (2008Nov.) LIBLINEAR: a library for large linear classification. Journal of Machine Learning Research 9, pp. 1871–1874. Cited by: §IVE.
 [9] (2007Jan.) Gradient projection for sparse reconstruction: application to compressed sensing and other inverse problems. IEEE Journal of Selected Topics in Signal Processing 1 (4), pp. 586–597. Cited by: §IIC.
 [10] (2008) Multiview dimensionality reduction via canonical correlation analysis. Technical report . Cited by: §IVD.
 [11] (2011) Entropy and information theory. 2 edition, SpringerVerlag. Cited by: §I.
 [12] (2002) Stochastic neighbor embedding. In Advances in Neural Information Processing Systems, pp. 833–840. Cited by: §IIIB.
 [13] (1997) Global positioning system: theory and practice. SpringerVerlag. Cited by: §I.
 [14] (1936Dec.) Relations between two sets of variables. Biometrika 28, pp. 321–377. Cited by: §IVD.
 [15] (2012) Spherical hashing. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 2957–2964. Cited by: §IIID, §IVA.
 [16] (2011Mar.) Product quantization for nearest neighbor search. IEEE Transactions on Pattern Analysis and Machine Intelligence 33 (1), pp. 117–128. Cited by: §IV.
 [17] (2010) SUN database: largescale scene recognition from abbey to zoo. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 3485–3492. Cited by: §IV.
 [18] (2013) Random maximum margin hashing. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 873–880. Cited by: §I.
 [19] (2012Sep.) Semisupervised hashing for largescale search. IEEE Transactions on Pattern Analysis and Machine Intelligence 34 (12), pp. 2393–2406. Cited by: §I.
 [20] (2012) Isotropic hashing. In Advances in Neural Information Processing Systems, pp. 1646–1654. Cited by: §I, §IIIA, §IVA.
 [21] (2009) Learning multiple layers of features from tiny images. Technical report Cited by: §IID.
 [22] (2009) Learning to hash with binary reconstructive embeddings. In Advances in Neural Information Processing Systems, pp. 1042–1050. Cited by: §I.
 [23] (2012Nov.) Kernelized localitysensitive hashing. IEEE Transactions on Pattern Analysis and Machine Intelligence 34 (6), pp. 1092–1104. Cited by: §I.
 [24] (2011) Hashing algorithms for largescale learning. In Advances in Neural Information Processing System, Cited by: §I.
 [25] (2014) Fast supervised hashing with decision trees for highdimensional data. In The IEEE Conference on Computer Vision and Pattern Recognition, pp. 1971–1978. Cited by: §IVD.
 [26] (201503) Multiview alignment hashing for efficient image search. IEEE Transactions on Image Processing 24 (3), pp. 956–966. Cited by: §I.
 [27] (2014) Discrete graph hashing. In Advances in Neural Information Processing Systems, Cited by: §I.
 [28] (2011) Hashing with graphs. In International Conference on Machine Learning, Cited by: §I.
 [29] (2012) Compact hyperplane hashing with bilinear functions. In International Conference on Machine Learning, Cited by: §I.
 [30] (1999) Object recognition from local scaleinvariant features. In IEEE International Conference on Computer Vision, pp. 1150–1157. Cited by: §I, §IV.
 [31] (2013) Cartesian kmeans. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 3017–3024. Cited by: §IIIA, §IVA.
 [32] (200105) Modeling the shape of the scene: a holistic representation of the spatial envelope. International Journal of Computer Vision 42 (3), pp. 145–175. Cited by: §I, §IV.
 [33] (2015) Hashing on nonlinear manifolds. IEEE Transactions on Image Processing 24 (6), pp. 1839–1851. Cited by: §I.
 [34] (2015) Supervised discrete hashing. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 37–45. Cited by: §IVD.
 [35] (2013) Inductive hashing on manifolds. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 1562–1569. Cited by: §I, §IIIB, §IVA.
 [36] (2009Nov.) Hash kernels for structured data. Journal of Machine Learning Research 10, pp. 2615–2637. External Links: ISSN 15324435 Cited by: §I.
 [37] (201205) LDAHash: improved matching with smaller descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence 34 (1), pp. 66–78. Cited by: §I.
 [38] (2008) Largescale manifold learning. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, pp. 1–8. Cited by: §I.
 [39] (2015Sept) Neighborhood discriminant hashing for largescale image retrieval. IEEE Transactions on Image Processing 24 (9), pp. 2827–2840. Cited by: §I.
 [40] (200805) 80 million tiny images: a large data set for nonparametric object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 30 (11), pp. 1958–1970. Cited by: §I, §IID.
 [41] Visualizing data using tsne. . Cited by: §IIIB.
 [42] (2012) Supervised hashing with kernels. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 2074–2081. Cited by: §I, §IVD.
 [43] (2009) Feature hashing for large scale multitask learning. In International Conference on Machine Learning, pp. 1113–1120. Cited by: §I.
 [44] (2008) Spectral hashing. In Advances in Neural Information Processing Systems, pp. 1753–1760. Cited by: §I, §IIIC, §IVA.
 [45] (2013) Harmonious hashing. In International Joint Conference on Artificial Intelligence, pp. 1820–1826. Cited by: §IIIA, §IVA.
 [46] (2011) Iterative quantization: a procrustean approach to learning binary codes. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 817–824. Cited by: §I, §IIIA, §IIIC, §IVA, §IVD.
 [47] (201602) Sparse hashing tracking. IEEE Transactions on Image Processing 25 (2), pp. 840–849. Cited by: §I.
 [48] (201507) Fullspace local topology extraction for crossmodal retrieval. IEEE Transactions on Image Processing 24 (7), pp. 2212–2224. Cited by: §I.
 [49] (201512) Bitscalable deep hashing with regularized similarity learning for image retrieval and person reidentification. IEEE Transactions on Image Processing 24 (12), pp. 4766–4779. Cited by: §I.