Robust Hashing for Multi-View Data: Jointly Learning Low-Rank Kernelized Similarity Consensus and Hash Functions

Robust Hashing for Multi-View Data: Jointly Learning Low-Rank Kernelized Similarity Consensus and Hash Functions

Abstract

Learning hash functions/codes for similarity search over multi-view data is attracting increasing attention, where similar hash codes are assigned to the data objects characterizing consistently neighborhood relationship across views. Traditional methods in this category inherently suffer three limitations: 1) they commonly adopt a two-stage scheme where similarity matrix is first constructed, followed by a subsequent hash function learning; 2) these methods are commonly developed on the assumption that data samples with multiple representations are noise-free,which is not practical in real-life applications; 3) they often incur cumbersome training model caused by the neighborhood graph construction using all points in the database (). In this paper, we motivate the problem of jointly and efficiently training the robust hash functions over data objects with multi-feature representations which may be noise corrupted. To achieve both the robustness and training efficiency, we propose an approach to effectively and efficiently learning low-rank kernelized 1 hash functions shared across views. Specifically, we utilize landmark graphs to construct tractable similarity matrices in multi-views to automatically discover neighborhood structure in the data. To learn robust hash functions, a latent low-rank kernel function is used to construct hash functions in order to accommodate linearly inseparable data. In particular, a latent kernelized similarity matrix is recovered by rank minimization on multiple kernel-based similarity matrices. Extensive experiments on real-world multi-view datasets validate the efficacy of our method in the presence of error corruptions.

1Introduction

Hashing is dramatically efficient for similarity search over low-dimensional binary codes with low storage cost. Intensive hashing methods valid on single data source have been proposed which can be classified into data-independent hashing such as locality sensitive hashing (LSH) [5] and data-dependent hashing or learning based hashing [41].

In real-life situations, data objects can be decomposed of multi-view (feature) spaces where each view can characterize its individual property, e.g., an image can be described by color histograms and textures, and the two features turn out to be complementary to each other [32]. Consequently, a wealth of multi-view hashing methods [50] are developed in order to effectively leverage complementary priors from multi-views to achieve performance improvement in similarity search. The critical issue is to ensure the learned hash codes can well preserve the original data similarities regarding view-dependent feature representations. To be specific, similar hash codes are assigned to data objects that consistently capture nearest neighborhood structure across all views.

1.1Motivation

Despite improved performance delivered by existing multi-view hashing methods [50], some fundamental limitations can be identified:

  • The learning process is conducted by a two-stage mechanism where hash functions are learned based on pre-constructed data similarity matrix. Their methods commonly assume that data samples are noise-free under multiple views whereas in real-world applications input data objects may be noisy (e.g., missing values in pixels), resulting in corresponding similarity matrices being corrupted by considerable noises [36]. Moreover, the recovery of consensus or requisite similarity values across views in the presence of noise contamination remains an unresolved challenge in multi-view data analysis [14].

    This motivates us to deliver a framework to jointly and effectively learn similarity matrices and robust hash functions with kernel functions plugged because the kernel trick is able to tackle linearly inseparable data [20]. To this end, a latent kernelized similarity matrix is recovered shared across views by using low-rank representation (LRR) [22] which is robust to corrupted observations. The recovered low-rank kernelized similarity matrix is consensus-reaching across views and can reveal the true underlying structures in data points.

  • State-of-the-art multi-view hashing methods is less efficiency in their learning procedure because the learning is performed by building and accessing a neighborhood graph using all points (). This action is intractable in off-line training when is large.

    To this end, we are further motivated to employ an landmark graph to build an approximate neighborhood graph using landmarks [19], in which the similarity between a pair of data points is measured with respect to a small number of landmarks (typically a few hundred). The resulting graph is built in time and sufficiently sparse with performance approaching to true -NN graphs as the number of landmarks increases [19].

1.2Our Method

In this paper, we propose a novel approach to robust multi-view hashing by effectively and efficiently learning a set of hash functions and a low-rank kernelized similarity matrix shared by multiple views.

We remark that our method is fundamentally different from existing multi-view hashing methods that are conditioned on corruption-free similarities, which has diminished their application to real-world tasks. Instead, we propose to learn hash functions and kernel-based similarities under a more realistic scenario with noisy observations. Our method is advantageous in the aspect of efficiency due to the employment of approximate neighborhood with landmark graphs. We clarify the recovered low-rank similarity matrix in kernel functions to be the kernelized rather than kernel since it is not a symmetric matrix yet characterizes non-linear similarities. The proposed method is also different from partial view study [10], where they consider the case that data examples with some modalities are missing. Our approach follows the setting of multi-view learning which aims to improve existing single view model by learning a model utilizing data collected from multiple channels [47] where all data samples have full information in all views.

In our framework, the low rank minimization is enforced to yield a consensus-reaching, kernelized similarity matrix shared by multiple views where larger similarity values indicate corresponding data objects from the same cluster, while smaller similarity values imply those come from distinct clusters. Thus, the learned low-rank similarity matrix against multi-views can reflect the underlying clustering information.

Technically, a nonlinear kernelized similarity matrix in the -th view, denoted as , can be decomposed into three components: (1) A latent low-rank kernelized similarity matrix , representing the nonlinear requisite or consensus similarities shared across views; (2) a view-dependent redundancy characterizing its individual similarities; and (3) possible error corruptions for view-specific representations. We unify view redundancy and errors into and impose an -norm constraint on it, denoted as . This is because view redundancy and disturbing errors are always sparsely distributed, and minimizing is able to identify non-zero sparse columns revealing corresponding redundancy/errors. Note that in this work, “error” generally refers to error corruptions or perturbation, e.g., noise or missing values, in view-dependent feature values. These principles are formulated into an objective function, which is optimized based on the inexact Augmented Lagrangian Multiplier (ALM) scheme [15]. It allows us to jointly learn a latent low-rank nonlinear similarity with corruption free and optimal hash functions for multi-view data, where hash codes are restricted to well preserve local (neighborhood) geometric structures in each view. We remark that several cross-view semantic hashing algorithms [26] have been developed to embed multiple high dimensional features from heterogeneous data sources into one Hamming space, while preserving their original similarities. Our setting is fundamentally different from cross-view/modal hashing in the aspect that we aim to leverage multiple features to jointly learn hash functions and a latent nonlinear similarity matrix over a homogeneous data source. To the best of our knowledge, we are the first to systematically address the problem of multi-view hashing with possible data error corruptions.

1.3Contributions

The major contributions of this paper are three-fold.

  • We motivate the problem of robust hashing over multi-view data with nonlinear data distribution, and propose to learn the robust hash functions and a low-rank kernelized similarity matrix shared by views.

  • An iterative low-rank recovery optimization technique is proposed to learn the robust hashing functions. For the sake of efficiency, the neighborhood graph is approximated by using landmark graphs with sparse connection between data points.

  • Extensive experiments conducted on real-world multi-view datasets validate the efficacy of our method in the presence of error corruptions for multi-view feature representations.

2Related Work

2.1Multi-view Learning based Hashing

The purpose of multi-view learning based hashing is to learn better hash codes by leveraging multiple views. Some recent representative works include Multiple Feature Hashing (MFH) [29], Composite Hashing with Multiple Sources (CHMS) [50], Compact Kernel Hashing with multiple features (CKH) [21], and Multi-view Sequential Spectral Hashing (SSH) [8]. However, these methods have common drawbacks that they typically apply spectral graph technique (e.g., -NN graph) to model a similarities between data points. In general, the complexity of constructing the similarity matrix is for data points, which is not pragmatic in large-scale applications. Moreover, the similarity matrix induced by graph construction is very sensitive to noise corruptions. To avoid the construction of similarity matrix, Shen et al. [28] present a Multi-View Latent Hashing (MVLH) to learn hash codes by performing matrix factorization on a unified kernel feature space over multiple views. Nonetheless, there are significant differences between MVLH and our approach. First, matrix factorization is performed on a unified kernel space which is formed by simply concatenating multiple kernel feature spaces. This would discard distinct local structures in individual views. By contrast, the kernelized similarity matrix is constructed with respect to the distinct characteristic in each view. Second, MVLH neglects the case of potential noise corruption in data samples. In this aspect, we attentively employ the low-rank representation (LRR) [22] to recover latent subspace structures from corrupted data.

2.2Low-rank Modeling

Low-rank modeling in attracting increasing attention due to its capability of recovering the underlying structure among data objects [42]. It has striking success in many applications such as data compression [42], subspace clustering [22], and image processing [55]. For instance, in [53], Zhang et al. consider a joint formulation of recovering low-rank and sparse subspace structures for robust representation.

Nowadays, data are usually collected from diverse domains or obtained from various feature extractors, and each group of features can be regarded as a particular view [47]. Moreover, these data can be easily corrupted by potential noises (e.g., missing pixels or outliers), or large variations (e.g., post variations in face images) in real applications. In practice, the underlying structure of data could be multiple subspaces, and thus Low-Rank Representation (LRR) is designed to find subspace structures in noisy data [22]. The multi-view low-rank analysis [14] is a recently proposed multi-view learning approach, which introduces low-rank constraint to reveal the intrinsic structure of data, and identifies outliers for the representation coefficients in low-rank matrix recovery.

In this paper, we are the first to apply low-rank learning to reveal structured kernalized similarity among multi-view data, and scale it up well to large-scale applications.

3Robust Multi-view Hashing

3.1Preliminary and Problem Definition

Let be the embedding function for nonlinear feature spaces, each of which corresponds to one view. Following the Kernelized Locality Sensitive Hashing [9], we uniformly select samples from the training set , denoted by (), to construct kernelized similarity matrices under multiple views. Given a sample represented by its feature , the -th hash bit can be generated via the linear projection:

where denotes the element-wise function, which is 1 if it is larger or equal to 0 and -1 otherwise. indicates the linear combination of landmarks, which can be the cluster centers [19] via scalable -means clustering over the feature space with dimensions. is a bias term. Then, we have

where denotes the -th column of , such that , and denotes the kernelized similarity matrix between landmarks and samples corresponding to the kernelized representation . Accordingly, the hash code of can be rewritten via the kernel form,

where and .

Given a set of training samples that may contain errors, denotes the -th feature of , and is the dimensionality for the feature space regarding the -th view. Then is the view matrix corresponding to the feature of all training data. is the vector representation of the training data using all features where , and is the number of views. We denote as the hash codes of the training samples corresponding to all features, and as the hash codes of the training data for the -th view. We aim to learn a latent low-rank kernel matrix shared across multiple kernels, and construct a set of robust hashing functions for multi-view data where (), and is the number of hashing functions, i.e., the hash code length. The kernel function is plugged into hash function because the kernel trick has been theoretically and empirically proved to be able to tackle the data distribution that is almost linearly inseparable [20].

3.2Low-rank Kernelized Similarity Recovery from Multi-views

Given a collection of high-dimensional multi-view data samples that may contain certain errors for each view-specific representation, we construct multiple nonlinear feature spaces , each of which represents one feature view. To leverage multiple complementary representations, we propose to derive a consensus low-rank kernelized similarity matrix recovered from corrupted data objects, and shared across views. This low-rank nonlinear similarity matrix is considered as the most requisite component, whilst each view also contains individual non-requisite information including redundancy and errors. We explicitly model the redundancy via sparsity since multi-view study suggests that each individual view is sufficient to identify most of the similarity structure, and the deviation between requisite component and data sample is sparse [12]. In reality, data samples can be grossly corrupted due to the sensor failure or communication errors. Thus, an -norm is adopted to characterize errors since they usually cause column sparsity in an affinity matrix [22].

In our framework, the low-rank similarity matrix is constructed to be sparse by considering data samples and landmarks, thus ascertaining the efficiency of our approach. Therefore, the latent low-rank kernelized similarity matrix can be recovered from through a low-rank constraint on and sparse constraint on each , that is,

where is the trade-off parameter and encodes the summation of error corruption and possible noise information regarding the -th view.

3.3Objective Function

Many studies [41] have shown the benefits to exploit local structure of the training data to infer accurate and compact hash codes. However, all these algorithms are sensitive to error corruptions, hampering them to be effective in practical situations. By contrast, we propose to jointly learn hash codes by preserving local similarities in multiple views while being robust to errors. To exploit the local structure in each view, we define affinity matrices , one for each view, that is,

where is the -nearest neighbor set, and the Euclidean distance is employed in each feature space to determine the neighborhood. A reasonable criteria of learning hash codes from the -th view is to ensure similar objects in the original space should have similar binary hash codes. This can be formulated as below:

Given a training sample , we expect the optimal hash code consistent with its distinct hash codes derived from each view. In this way, the local geometric structure in a single view can be globally optimized. Therefore, we have

where is a trade-off parameter. The main bottleneck in the above formulation is computation where the cost of building the underlying graph and its associate affinity matrix is , which is intractable for large . To avoid the computational bottleneck, we employ a landmark graph by using a small set of points called landmarks to approximate the data neighborhood structure [19]. Similarities of all database points are measured with respect to these landmarks, and the true adjacency/similarity matrix in the -th view is approximated using these similarities. First, K-means clustering 2 is performed on data points to obtain () clusters center that act as landmark points. Next, the landmark graph defines the truncated similarities ’s between all data points and landmarks as,

where denotes the indices of () nearest landmarks of points in according to a distance function such as distance, and denotes the bandwidth parameter. Note that the matrix is highly sparse. Each row of contains only non-zero entries which sum to 1. Thus, the landmark graph provides a powerful approximation to the adjacency matrix as where [19].

For ease of representation, we denote . To learn a set of hashing functions and a consensus nonlinear representation in a joint framework, we formulate the objective function of robust multi-view hashing as follows

where is a trade-off parameter, enforces the hash code to be binary codes, and the constraint is imposed to encourage bit de-correlations while avoiding the trivial solution. Due to the discrete constraints and non-convexity, the optimization problem in Eq. is difficult to solve. Following spectral hashing [41], we relax the constraints to be , then we have

We rewrite the objective function by further minimizing the least square error regarding while regularizing coupled with trade-off parameters and , it then has

Eq. is still non-convex due to orthogonal constraint . Fortunately with either , or , fixed, the problem is convex with respect to the other variables. Therefore, we present an alternating optimization way that can efficiently find the optimum in a few steps. First, given and , we show that computation expressions of and can be obtained. To compute and , we employ an efficient optimization technique, the inexact augmented Lagrange multiplier (ALM) algorithm [15].

4Optimization

4.1Compute and

With other variables fixed, and setting the derivative of Eq. w.r.t. to zero, we get

Setting the derivative of Eq. w.r.t. to zero, we yield

Substituting in Eq. into Eq., we have

where is the centering matrix, and .

4.2Compute and

With variables and being fixed, the problem turns to be

The rank minimization problem has been well studied in literature [17]. By introducing an auxiliary variable such that , Eq. can be then converted into the following equivalent form:

where and represent the Lagrange multipliers, denotes the inner product of matrices, and is an adaptive penalty parameter. Next we will elaborate the update rules for each of , , and by minimizing while fixing the others.

Solving for When the other variables are fixed, the subproblem w.r.t. is

It can be solved by the Singular Value Threshold method [2]. More specifically, let be the SVD form of , the updating rule of using the SVD operator in each iteration will be

where is the shrinkage operator [16].

Solving for The subproblem with respect to can be simplified as

which enjoys a closed form solution .

Solving for With the other variables being fixed, we update by solving

For ease of representation, we define . Then, the problem in Eq. can be rewritten as

Hence, the problem in Eq. can be decomposed into independent subproblems: , subject to . Each subproblem is a proximal operator problem, which can be efficiently solved by the projection algorithm in [7].

4.3Learning Hash Codes

Once the hashing function implemented by and is learned by exploiting the kernelized similarity consensus , we can generate hash codes for both database and query samples, denoted as , via Eq. .

where , represents the similarity between and the -th landmark using Gaussian RBF kernel over the concatenated feature space for all views.

4.4Out-of-Sample Extension

An essential part of hashing is to generate binary codes for new samples, which is known as out-of-sample problems. A widely used solution is the Nystrm extension [1]. However, this is impractical for large-scale hashing since the Nystrm extension is as expensive as doing exhaustive nearest neighbor search with a complexity of for data points. In order to address the out-of-sample extension problem, we employ a non-parametric regression approach, inspired by Shen et al. [27]. Specifically, given the hashing embedding for the entire training set , for a new data point , we aim to generate a hashing embedding while preserving the local neighborhood relationships among its neighbors in . A simple inductive formulation can produce the embedding for a new data point by a sparse linear combination of the base embeddings:

where we define

However, Eq. does not scale well for computing out-of-sample extension () for large-scale tasks. To this end, we employ a prototype algorithm [27] to approximate using only a small base set:

where is the sign function, and is the hashing embedding for the base set which is the cluster centers obtained by K-means. In this stage, the major computation cost comes from K-means clustering, which is in time ( is the feature dimension, and is the number of iterations in K-means). The iteration number can be set less than 50, thus, the K-means only costs . Considering that is much less than , the total time is linear in the size of training set. The computation of distance between and cost . Thus, the overall time cost is .

5Complexity Analysis

We analyze the time complexity regarding per iteration of the optimization strategy. The complexity of computing and in Eq. is and , respectively. Commonly, landmarks are generated off-line via scalable K-means clustering for less than 50 iterations, keeping the complexity of computing to be . The complexity of computing hash codes for a new sample is . Overall, the time complexity is in one iteration, which is linear with respect to the training size.

6Experiments

6.1Experimental Settings

Competitors We compare our method with recently proposed state-of-the-art multiple feature hashing algorithms:

  • Multiple feature hashing (MFH) [29]: This method exploits local structure in each feature and global consistency in the optimization of hashing functions.

  • Composite hashing with multiple sources (CHMS) [50]: This method treats a linear combination of view-specific similarities as an average similarity which can be plugged into a spectral hashing framework.

  • Compact kernel hashing with multiple features (CKH) [21]: It is a multiple feature hashing framework where multiple kernels are linearly combined.

  • Sequential spectral hashing with multiple representations (SSH) [8]: This method constructs an average similarity matrix to assemble view-specific similarity matrices.

  • Multi-View Latent Hashing (MVLH) [28]: This is an unsupervised multi-view hashing approach where binary codes are learned by the latent factors shared by multiple views from an unified kernel feature space.

Datasets We conduct the experiments on two image benchmarks: CIFAR-10 3 and NUS-WIDE.

  • consists of 60K 3232 color images from ten object categories, each of which contains 6K samples. Every image is assigned to a mutually exclusive class label and for each image, we extract 512-dimensional GIST feature [25] and 300-dimensional bag-of-words quantized from dense SIFT features [23] to be two views.

  • [4] contains 269,648 labeled images crawled from Flickr and is manually annotated with 81 categories. Three types of features are extracted: 128-dimensional wavelet texture, 225-dimensional block-wise color moments, and 500-dimensional bag-of-words to construct three views.

Multi-view Corruption Setting In CIFAR-10, considering that missing features may have some structure, we remove a square patch of pixels from each image covering 25% of the total number of pixels. The location of the patch is uniformly sampled for each image. This will naturally deteriorate view-dependent feature representations. In NUS-WIDE, we consider the scenario where 20% of feature values in each view are corrupted with perturbation noise following a standard Gaussian distribution.

Parameter Setting In the training phase, we uniformly sample 30K and 100K images as training data from both datasets, and generate 300 and 500 landmarks. That is, we fix the graph construction parameters , on CIFAR-10, and , on NUS-WIDE, respectively. In the testing phase, we randomly select 1,000 query images in which the true neighbors of each image are defined as the semantic neighbors which share at least one common semantic label. For our method and CKH, we use Gaussian RBF kernel , where represents the Euclidean distance within the -th feature space. The parameter is learned via the self-tuning strategy [49].

In Eq., there are five tunable parameters: , , , , and . Parameters and controlling global hash code learning and regularization on hashing functions are set as and , respectively. For , , and , we tune their optimal combination, that is, , , and , as conducted in Section 6.3.

Evaluation Metric The mean precision-recall and mean average precision (MAP) are computed over the retrieved set consisting of the samples with the hamming distance [20] using 8 to 32 bits to a specific query. We carry out hash lookup within a Hamming radius 2 and report the mean hash lookup precision over all queries. For a query , the average precision (AP) is defined as , where is the number of ground-truth neighbors of in database, is the number of entities in database, denotes the precision of the top retrieved entities, and if the -th retrieved entity is a ground-truth neighbor and , otherwise. Ground truth neighbors are defined as items which share at least one semantic label. Given a query set of size , the MAP is defined as the mean of the average precision for all queries: .

6.2Results

Table 1: Hash lookup precision (meanstd) with Hamming radius 2 on different databases.

Method

P=8 P=32 P=48 P=128 P=8 P=32 P=48 P=128

MFH

23.310.71 28.190.48 26.380.68 23.680.71 23.520.72 26.490.85 33.550.49 34.970.81
CHMS 25.610.22 31.80.66 26.540.52 19.380.84 27.540.41 30.220.92 28.240.96 27.521.12
CKH 31.750.53 32.050.72 37.320.76 34.450.81 29.720.43 37.840.63 33.560.82 34.421.32
SSH 27.340.46 35.780.68 29.360.63 27.520.72 28.950.46 33.420.88 30.050.71 29.210.98
MVLH 32.270.41 40.240.63 44.810.46 42.060.62 31.920.62 39.050.87 40.310.52 36.120.70
Ours 36.730.41 47.630.52 51.220.36 46.570.44 34.210.48 46.350.47 44.330.34 43.080.32
Performance comparison on CIFAR-10 database. Left: Mean precision-recall of Hamming ranking at 48 bits. Right: Mean average precision of Hamming ranking w.r.t. 8-128 bits. Performance comparison on CIFAR-10 database. Left: Mean precision-recall of Hamming ranking at 48 bits. Right: Mean average precision of Hamming ranking w.r.t. 8-128 bits.
Performance comparison on NUS-WIDE database. Left: Mean precision-recall of Hamming ranking at 64 bits. Right: Mean average precision of Hamming ranking w.r.t. 8-128 bits. Performance comparison on NUS-WIDE database. Left: Mean precision-recall of Hamming ranking at 64 bits. Right: Mean average precision of Hamming ranking w.r.t. 8-128 bits.
The MAP variations of different parameter settings on NUS-WIDE database. The MAP variations of different parameter settings on NUS-WIDE database. The MAP variations of different parameter settings on NUS-WIDE database.
(a) is fixed as (b) is fixed as (c) is fixed as

We report the mean precision-recall curves of Hamming ranking, and mean average precision (MAP) w.r.t. different number of hashing bits over 1K query images. Results are shown in Fig. ?, which are computed from top-100 retrieved samples. It can be seen from top subfigure of Fig. ? that our method achieves a performance gain in both precision and recall over all counterparts and the second best is MVLH. This can demonstrate the superiority of using nonlinear hashing functions in nonlinear space. More importantly, the latent consensus kernelized similarity matrix by low-rank minimization is not only effective in leveraging complementary information from multi-views, but also robust against the presence of errors. The subfigure (bottom) in Fig. ? shows that as the hashing bit number varies, our method consistently keeps superior performance. Specifically, it reaches the highest precision value for 48 bits and shows a relatively steady performance with more hashing bits. The results from the NUS-WIDE database are shown in Fig. ?. Once again we can see performance gaps in precision-recall between our approach and competitors, as illustrated in top subfigure of Fig. ?. This validates the advantage of our method by exploiting consensus of kernelized similarity to learn robust nonlinear hashing functions. In subfigure (bottom) of Fig. ?, as the number of hashing bit increases, our method is able to keep high and steady MAP values.

Table 2: Training/test time comparison on different algorithms using 64 bits. All training/test time is recorded in second. The training size of two datasets are 30K and 100K, respectively.

Method

Training Test Training Test

MFH

32.8 6.4 41.6 8.5
CHMS 29.8 4.7 37.2 7.8
SSH 23.6 1.3 31.7 2.4
CKH 10.7 2.3 15.3 3.2
MVLH 20.4 2.2 28.1 4.3
Ours 14.1 2.6 19.2 3.5

To evaluate the impact of hashing bit numbers on performance of hash lookup, in Table 1, we report hash lookup mean precision with standard deviation (meanstd) in the case of 8, 32, 48, 128 bits on both databases. Similar to Hamming ranking results, our method achieves the better performance than others and obviously increasing performance with less than 32 bits, which demonstrates that our approach with compact hashing codes can retrieve more semantically related images than all baselines in terms of hash lookup.

In Table 2, we report the comparison on training/test time over the two image benchmarks. CKH and our method are much more efficient by taking less than 15s and 20s respectively to train on CIFAR-10 and NUS-WIDE using 32 bits. The efficiency improvement comes from the usage of landmarks. While our method is slightly less efficient to CKH because of the low-rank kernelized similarity recovery, it is very comparable to CKH and consistently superior to CKH in other performance. MVLH is relatively costly due to its expensive matrix factorization in its kernel space. MFH and CHMS are time-consuming in training stage because they both involve the eigen-decomposition of a dense affinity matrix, which is not scalable to a large-scale setting. SSH has a gain in efficiency compared with MFH and CHMS on account of their approximation on the K-nearest graph construction [8].

6.3Parameter Tuning

Convergence study over real-world datasets.

In this experiment, we test different parameter settings for our algorithm to study the performance sensitivity. We learn three parameters: , , and , corresponding to the term of requisite component, non-requisite decomposition, and hashing function learning in Eq.. For these parameters, we tune them from . We fix one of the parameters in , , and to report the MAP while the other two parameters are changing. The results are shown in Fig. ?. In Fig. ? (a), by fixing , we show the performance variance on different pairs of and . We can observe that our algorithms achieves a relatively higher MAP when , and . The similar performance can also be seen from Fig. ? (b) and Fig. ? (c). Thus, among different combinations, the method gains the best performance when , , and , while it is relatively insensitive to varied parameters setting. With optimal combination of parameters, we study the issue of convergence. In Fig. ?, we can observe that our algorithm becomes convergent in less than 40 iterations, demonstrating its fast convergence rate.

6.4Out-of-Sample Case

In this experiment, we study the property of out-of-sample extension. We take the CIFAR-10 dataset as the base benchmark to train base embeddings. Another dataset MNIST is considered as the testing bed. The MINIST dataset [13] consists of 70K images, each of 784 dimensions, of handwritten digits from “0” to “9″. As in Fig. ?, our method achieves the best results. On this dataset, we can clearly see that our method outperforms MVLH by a large margin, which increases as code length increases. This further demonstrates the advantage of kernelized low-rank embedding as a tool for hashing by embedding high dimensional data into a lower dimensional space. This dimensionality reduction procedure not only preserves the local neighborhood, but also reveals global structure.

Performance comparison on MNIST database using base hash functions learn from CIFAR-10 dataset. Left: Precision with respect to number of returned sampled at 64 bits. Right: Mean average precision of Hamming ranking w.r.t. 8-128 bits. Performance comparison on MNIST database using base hash functions learn from CIFAR-10 dataset. Left: Precision with respect to number of returned sampled at 64 bits. Right: Mean average precision of Hamming ranking w.r.t. 8-128 bits.

7Conclusion

In this paper, we motivate the problem of robust hashing for similarity search over multi-view data objects under a practical scenario that error corruptions for view-dependent feature representations are presented. Unlike existing multi-view hashing methods that take a two-phase scheme of constructing similarity matrices and learning hash functions separately, we propose a novel technique to jointly learn hash functions and a latent, low-rank, corruption-free kernelized similarity under multiple representations with potential noise corruptions. Extensive experiments conducted on real-world multi-view data sets demonstrate the superiority of our method in terms of efficacy.

Footnotes

  1. We use kernelized similarity rather than kernel, as it is not a squared symmetric matrix for data-landmark affinity matrix.
  2. In practice, running K-means algorithm on a small subsample of the database with very few iterations is sufficient.
  3. http://www.cs.toronto.edu/ kriz/cifar.html

References

  1. Learning eigenfunctions links spectral embedding and kernel pca.
    Yoshua Bengio, Olivier Delalleau, Nicolas Le Roux, Jean-François Paiement, Pascal Vincent, and Marie Ouimet. Neural Comput, 16(10):2197–2219, 2004.
  2. A singular value thresholding algorithm for matrix completion.
    Jian-Feng Cai, Emmanuel J. Cands, and Zuowei Shen. SIAM Journal on Optimization, 20(4):1957–1982, 2010.
  3. Exact matrix completion via convex optimization.
    Emmanuel J. Candes and Benjamin Recht. Foundations of Computational Mathmatics, 9(6):717–772, 2009.
  4. Nus-wide: a real-world web image database from national university of singapore.
    Tat-Seng Chua, Jinhui Tang, Richang Hong, Haojie Li, and Zhiping Luo. In ACM CIVR, 2009.
  5. Locality-sensitive hashing scheme based on p-stable distribution.
    Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab S. Mirrokni. In SOCG, 2004.
  6. Low-rank structure learning via nonconvex heuristic recovery.
    Yue Deng, Qionghai Dai, Risheng Liu, Zengke Zhang, and Sanqing Hu. IEEE Transactions on Neural Networks and Learing Systems, 24(3):383–396, 2013.
  7. Efficient projections onto the -ball for learning in high dimensions.
    John Duchi, Shai Shalev-Shwartz, Yoram Singer, and Tushar Chandra. In ICML, 2008.
  8. Sequential spectral learning to hash with multiple representations.
    Saehoon Kim, Yoonseop Kang, and Seungjin Choi. In ECCV, pages 538–551, 2012.
  9. Kernelized locality-sensitive hashing for scalable image search.
    Brian Kulis and Kristen Grauman. In ICCV, 2009.
  10. A co-training approach for multi-view spectral clustering.
    Abhishek Kumar and Hal Daume III. In ICML, 2011.
  11. Learning hash functions for cross-view similarity search.
    Shaishav Kumar and Raghavendra Udupa. In IJCAI, pages 1360–1365, 2011.
  12. Co-regularized multi-view spectral clustering.
    Abhishek Kumar, Piyush Rai, and Hal Daum. In NIPS, 2011.
  13. Gradient-based learning applied to document recognition.
    Yann LeCun, Léeon Bottou, Yoshua Bengio, and Patrick Haaffner. In Proceedings of IEEE, 1998.
  14. Multi-view low-rank analysis for outlier detection.
    Sheng Li, Ming Shao, and Yun Fu. In SIAM Data Mining, pages 748–756, 2015.
  15. The augmented lagrange multiplier method for exact recovery of corrupted low-rank matrices.
    Zhouchen Lin, Minming Chen, and Yi Ma. In arXiv:1009.5055, 2010.
  16. Linearized alternating direction method with parallel splitting and adaptive penalty for separable convex programs in machine learning.
    Zhouchen Lin, Risheng Liu, and Huan Li. Machine Learning, (2):287–325, 2015.
  17. Robust subspace segmentation by low-rank representation.
    Guangcai Liu, Zhuochen Lin, and Yong Yu. In ICML, 2010.
  18. Large graph construction for scalable semi-supervised learning.
    Wei Liu, Jun Wang, and Shih-Fu Chang. In ICML, 2010.
  19. Hashing with graphs.
    Wei Liu, Jun Wang, Sanjiv Kumar, and Shih-Fu Chang. In ICML, 2011.
  20. Supervised hashing with kernels.
    Wei Liu, Jun Wang, Rongrong Ji, Yugang Jiang, and Shih-Fu Chang. In CVPR, pages 2074 – 2081, 2012.
  21. Compact kernel hashing with multiple features.
    Xianglong Liu, Junfeng He, Di Liu, and Bo Lang. In ACM Multimedia, pages 881–884, 2012.
  22. Robust recovery of subspace structures by low-rank representation.
    Guangcan Liu, Zhuochen Lin, Shuicheng Yan, Ju Sun, Yong Yu, and Yi Ma. IEEE Trans. Pattern Anal. Mach. Intell., 35(1):171–184, 2013.
  23. Distinctive image features from scale-invariant keypoints.
    David Lowe. IJCV, 60:91–110, 2004.
  24. Multimodal similarity-preserving hashing.
    Jonathan Masci, Michael M. Bronstein, Alexander M. Bronstein, and Jurgen Schmidhuber. IEEE TPAMI, 36(4):824–830, 2014.
  25. Modeling the shape of the scene: a holistic representation of the spatial envelope.
    Aude Oliva and Antonio Torralba. IJCV, 42(3):145–175, 2001.
  26. Comparing apples to oranges: a scalable solution with heterogeneous hashing.
    Mingdong Ou, Peng Cui, Fei Wang, Jun Wang, Wenwu Zhu, and Shiqiang Yang. In ACM SIGKDD, pages 230–238, 2013.
  27. Inductive hashing on manifolds.
    Fumin Shen, Chunhua Shen, Qinfeng Shi, Anton van den Hengel, and Zhenmin Tang. In CVPR, pages 1562 – 1569, 2013.
  28. Multi-view latent hashing for efficient multimedia search.
    Xiaobo Shen, Fumin Shen, Quan-Sen Sun, and Yun-Hao Yuan. In ACM Multimedia, pages 831–834, 2015.
  29. Multiple feature hashing for real-time large scale near-duplicate video retrieval.
    Jingkuan Song, Yi Yang, Zi Huang, Heng-Tao Shen, and Richang Hong. In ACM Multimedia, pages 423–432, 2011.
  30. Semi-supervised hashing for scalable image retrieval.
    Jun Wang, Sanjiv Kumar, and Shih-Fu Chang. In CVPR, pages 3424 – 3431, 2010.
  31. Towards metric fusion on multi-view data: a cross-view based graph random walk approach.
    Yang Wang, Xuemin Lin, and Qing Zhang. In ACM CIKM, pages 805–810, 2013.
  32. Exploiting correlation consensus: Towards subspace clustering for multi-modal data.
    Yang Wang, Xuemin Lin, Lin Wu, Wenjie Zhang, and Qing Zhang. In ACM Multimedia, pages 981–984, 2014.
  33. Learning to hash on partial multi-modal data.
    Qifan Wang, Luo Si, and Bin Shen. In IJCAI, pages 3904–3910, 2015.
  34. Effective multi-query expansions: Robust landmark retrieval.
    Yang Wang, Xuemin Lin, Lin Wu, and Wenjie Zhang. In ACM Multimedia, pages 79–88, 2015.
  35. Lbmch: Learning bridging mapping for cross-modal hashing.
    Yang Wang, Xuemin Lin, Lin Wu, Wenjie Zhang, and Qing Zhang. In ACM SIGIR, 2015.
  36. Robust subspace clustering for multi-view data by exploiting correlation consensus.
    Yang Wang, Xuemin Lin, Lin Wu, Wenjie Zhang, Qing Zhang, and Xiaodi Huang. IEEE Transactions on Image Processing, 24(11):3939–3949, 2015.
  37. Unsupervised metric fusion over multiview data by graph random walk-based cross-view diffusion.
    Yang Wang, Wenjie Zhang, Lin Wu, Xuemin Lin, and Xiang Zhao. IEEE Transactions on Neural Networks and Learning System, 99:1–14, 2015.
  38. Shifting multi-hypergraphs via collaborative probabilistic voting.
    Yang Wang, Xuemin Lin, Lin Wu, Qing Zhang, and Wenjie Zhang. Knowledge and Information Systems, 46(3):515–536, 2016.
  39. Iterative views agreement: An iterative low-rank based structured optimization method to multi-view spectral clustering.
    Yang Wang, Wenjie Zhang, Lin Wu, Xuemin Lin, Meng Fang, and Shirui Pan. In IJCAI, 2016.
  40. Scalable heterogeneous translated hashing.
    Ying Wei, Yangqiu Song, Yi Zhen, Bo Liu, and Qiang Yang. In ACM SIGKDD, pages 791–800, 2014.
  41. Spectral hashing.
    Yair Weiss, Antonio Torralba, and Rob Fergus. In NIPS, 2008.
  42. Robust principal component analysis: Exact recovery of corrupted low-rank matrices by convex optimization.
    John Wright, Yigang Peng, Yi Ma, Arvind Ganesh, and Shankar Rao. In NIPS, 2009.
  43. Robust principal component analysis: exact recovery of corrupted low-rank matrices via convex optimization.
    John Wright, Yigang Peng, Yi Ma, Arvind Ganesh, and Shankar Rao. In NIPS, 2009.
  44. Efficient image and tag co-ranking: a bregman divergence optimization method.
    Lin Wu, Yang Wang, and John Shepherd. In ACM Multimedia, 2013.
  45. Exploiting attribute correlations: A novel trace lasso-based weakly supervised dictionary learning method.
    Lin Wu, Yang Wang, and Shirui Pan. IEEE Transactions on Cybernetics, 2016.
  46. Robust multi-view spectral clustering via low-rank and sparse decomposition.
    Rongkai Xia, Yan Pan, Lei Du, and Jian Yin. In AAAI, pages 2149–2155, 2014.
  47. A survey on multi-view learning.
    Chang Xu, Dacheng Tao, and Chao Xu. arXiv:1304.5634, 2013.
  48. Robust late fusion with rank minimization.
    Guangnan Ye, Dong Liu, I-Hong Jhuo, and Shih-Fu Chang. In CVPR, pages 3021–3028, 2012.
  49. Self-tuning spectral clustering.
    Lihi Zelnik-Manor and Pietro Perona. In NIPS, 2004.
  50. Composite hashing with multiple information sources.
    Dan Zhang, Fei Wang, and Luo Si. In ACM SIGIR, pages 225–234, 2011.
  51. Similarity preserving low-rank representation for enhanced data representation and effective subspace learning.
    Zhao Zhang, Shuicheng Yan, and Mingbo Zhao. Neural Networks, 53:81–94, 2014.
  52. Bilinear low-rank coding framework and extension for robust image recovery and feature representation.
    Zhao Zhang, Shuicheng Yan, Mingbo Zhao, and Fanzhang Li. Knowledge-Based Systems, 86:143–157, 2015.
  53. Joint low-rank and sparse principal feature coding for enhanced robust repersentation and visual classification.
    Zhao Zhang, Fanzhang Li, Mingbo Zhao, Li Zhang, and Shuicheng Yan. IEEE Transactions on Image Processing, 25(6):2429–2443, 2016.
  54. A closed form solution to multi-view low-rank regression.
    Shuai Zheng, Xiao Cai, Chris Ding, Feiping Nie, and Heng Huang. In AAAI, pages 1973–1979, 2015.
  55. Moving object detection by detecting contiguous outliers in the low-rank representation.
    Xiaowei Zhou, Can Yang, and Weichuan Yu. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(3):597–610, 2013.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minumum 40 characters
Add comment
Cancel
Loading ...
7577
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description