PUNCH: Positive UNlabelled Classification based information retrieval in Hyperspectral images
Abstract.
Hyperspectral images of landcover captured by airborne or satellitemounted sensors provide a rich source of information about the chemical composition of the materials present in a given place. This makes hyperspectral imaging an important tool for earth sciences, landcover studies, and military and strategic applications. However, the scarcity of labeled training examples and spatial variability of spectral signature are two of the biggest challenges faced by hyperspectral image classification. In order to address these issues, we aim to develop a framework for materialagnostic information retrieval in hyperspectral images based on PositiveUnlabelled (PU) classification. Given a hyperspectral scene, the user labels some positive samples of a material he/she is looking for and our goal is to retrieve all the remaining instances of the query material in the scene. Additionally, we require the system to work equally well for any material in any scene without the user having to disclose the identity of the query material. This materialagnostic nature of the framework provides it with superior generalization abilities. We explore two alternative approaches to solve the hyperspectral image classification problem within this framework. The first approach is an adaptation of nonnegative risk estimation based PU learning for hyperspectral data. The second approach is based on oneversusall positivenegative classification where the negative class is approximately sampled using a novel spectralspatial retrieval model. We propose two annotator models – uniform and blob – that represent the the labelling patterns of a human annotator. We compare the performances of the proposed algorithms for each annotator model on three benchmark hyperspectral image datasets – Indian Pines, Pavia University and Salinas.
1. Introduction
Hyperspectral imaging (HSI) (Landgrebe, 2002; Richards, 2013) measures reflected radiation from a surface at a series of narrow, contiguous frequency bands. It differs from multispectral imaging which senses a few wide, separated frequency bands. Hyperspectral imaging produces threedimensional data volumes, where and represent the spatial dimensions and represents the spectral dimension. Such detailed spectra contain finegrained information about the chemical composition of the materials in a scene that is richer than is available from a multispectral image (Geography, 2018).
Majority of the existing literature on hyperspectral image classification frame it as a supervised (CampsValls et al., 2013; CampsValls and
Bruzzone, 2005; F.Melgani and
B.Lorenzo, 2004; Li
et al., 2016; Santara et al., 2017) or semisupervised (CampsValls
et al., 2007; Buchel and Ersoy, 2018; Cui
et al., 2018) multiclass classification problem. Hyperspectral images pose a unique set of challenges when it comes to multiclass classification. Each material with a distinct spectral identity is a class. Hence the number of possible classes is countably infinite. When it comes to landcover, the same species of crop (for example, wheat) can have drastically different spectral signatures depending on the location where it is grown (due to differences in chemical composition of soil and water) and the time of the year (temperature, humidity, rainfall, etc) (Herold
et al., 2004; Rao
et al., 2007). Hence, creating a standardized library of spectral signatures of materials considering all these factors of variability is hard. Over and above, multiclass classification requires groundtruth labels for each class. Collection of groundtruth amounts to sourcing samples of a material from the exact location and exact time of the year and recording their spectral signatures (Shepherd and Walsh, [n.
d.])  a process that is impractical and intractable to be performed at scale. Also, different hyperspectral imaging systems produce images with different physical properties depending on the spectral response of the sensor, resolution, altitude, illumination and mode of capture (airborne vs. spaceborne), distortions and so on. As a result, multiclass classification models trained on opensource (but relatively old) benchmark datasets like Indian Pines and Pavia have extremely limited efficacy when it comes to largescale deployment in real life applications.
In this work, we formulate the hyperspectral classification problem as one that is materialagnostic, imaging system independent and not contingent on an extensive groundtruth labelling effort. The motivation is deployment at scale. Given a hyperspectral scene, the user marksup some known occurrences of the query material. No information is provided about pixels that do not contain the query material. The goal of the system is to locate all other occurrences of the same material in the scene with high precision and recall. The system should work for any target material with a distinct spectralsignature and it should not require the user to disclose the identity of the material being searched.
Our formulation builds upon the classical problem of ContentBased Information Retrieval (CBIR) in multimedia databases (Yoshitaka and
Ichikawa, 1999; Smeulders et al., 2000; Liu
et al., 2007). Given a query item, the task is to retrieve items from a database that are similar in content. At the heart of CBIR lies the task of designing a retrieval model. A retrieval model is a function that returns a score that is an estimate of the similarity of an element of the database with the query element. These scores can be used to find the most relevant elements for output. With the advent of deep learning, CBIR has witnessed unprecedented records of success in domains ranging from images (Krizhevsky
et al., 2012; Wan
et al., 2014; Lin
et al., 2015) to text (Mitra
et al., 2017) and audio (Van den
Oord et al., 2013) and multimodal information retrieval (Kiros
et al., 2014; Wang
et al., 2016).
We approach the problem from a PositiveUnlabelled (PU) classification (Denis
et al., 2000; Zhang and Zuo, 2008; Elkan and Noto, 2008; Hou
et al., 2018) perspective. PU classification algorithms are specialized to deal with the setting where the training data comprises of positive samples labelled by the user and unlabelled samples that may consist of both positive and negative classes. There are two main approaches to PU classification. The first approach is based on heuristicdriven intelligent sampling of the negative class followed by supervised training of a binary PositiveNegative (PN) classifier with the labelled positive and sampled negative examples (Nigam
et al., 1998). The second approach is based on nonnegative riskestimation in which the unlabelled data is treated as negative data with lesser weights (Lee and Liu, 2003). We explore both categories of algorithms in this work and present a comparison of performance results. Deep Neural Networks (Schmidhuber, 2015; Goodfellow
et al., 2016) have demonstrated extraordinary capability to efficiently model complex nonlinear functions in a large variety of applications including HSI classification (Santara et al., 2017; Zhu et al., 2017b). This has motivated us to use Deep Neural Networks with the stateoftheart BASSNet architecture of Santara et al. (Santara et al., 2017) as function approximators in all our experiments.
Our contributions in this paper can be summarized as follows:

We present a PU learning based formulation of the HSI classification problem for material and imagingplatform agnostic largescale information retrieval. To the best of our knowledge, this is the first work on the investigation of PU Learning for HSI data.

We design one solution each from the two families of PU learning algorithms – nonnegative risk estimation and PN classification – and compare their performances on three benchmark HSI datasets.

We propose a novel spectralspatial retrieval model for HSI data and use it for intelligent sampling of negative class for PN classification.

We propose two annotator models that represent the range of labelling patterns of a human annotator and use them to demonstrate the efficacy of our proposed solutions under different spatial distributions of the labelled positive class.
Section 2 introduces the essential theoretical concepts that we build upon in this paper. Section 3 gives a detailed description of the proposed framework and approaches to solution. Experimental results are presented in Section 4. Finally, Section 5 concludes the paper with a summary of our contributions and scope of future work.
2. Background
In this section, we present a brief introduction to the essential theoretical concepts used in the rest of the paper.
2.1. Nonnegative Risk Estimation based PU Learning (NNREPU)
Risk estimation based PU learning represents unlabelled data as a weighted combination of P and N data. Following the notation of Kiryo et al. (Kiryo et al., 2017), let be an arbitrary decision function and be the loss function such that is the loss incurred on predicting when the ground truth is . Let denote the joint probability distribution of imagepixels and their labels. The marginal distribution is where the unlabelled data is sampled from. Let and denote the classconditionals and and , the prior probabilities of the positive and negative classes respectively. The risk of the decision function can be written as:
(1) 
where and . Rewriting the law of total probability as and substituting in equation 1, we have the expression for unbiased PU loss:
(2) 
where and . In unbiased PU learning (Elkan and Noto, 2008; du Plessis et al., 2014; Du Plessis et al., 2015), the goal is to minimize an empirical estimate of this risk (with the expectations replaced by sample averages) to find the optimal decision function . Unfortunately, the empirical estimators of risk used in unbiased PU learning have no lower bound although the original risk objective in equation 1 is nonnegative (Kiryo et al., 2017). Minimization of the empirical risk tends to drive the objective negative without modeling anything meaningful especially when high capacity function approximators like deep neural networks are used to model . Kiryo et al. (Kiryo et al., 2017) propose the following biased, yet optimal, nonnegative risk estimator to address this problem:
(3) 
where denotes an empirical estimate of actual risk, . For the ease of training of a neural network classifier, with no loss in theoretical correctness, we represent the negative class by instead of in our experiments.
2.2. NonLocal Total Variation
NonLocal Total Variation (NLTV) is an unsupervised clustering objective demonstrated on hyperspectral data by Zhu et al. (Zhu et al., 2017a). Following the notation used by the authors, let be a region in a hyperspectral scene. Let be a Hilbert space. let , be the labelling function of a cluster such that the larger the value of , the higher is the likelihood of a pixel belonging to that cluster. Let be a measure of divergence between two given pixels such that a lower value of implies more resemblance. Nonlocal derivative is defined as:
(4) 
Nonlocal weight is defined as . The expression for nonlocal derivative in equation 4 can be rewritten in terms of nonlocal weight as:
(5) 
The NonLocal Total Variation (NLTV) objective is given by:
(6) 
is a data fidelity term representing the clustering objective and is the Total Variation regularizer. The parameter controls the amount of regularization. The authors of (Zhu et al., 2017a) present a linear and a quadratic model of this objective, depending upon the design of . They also apply the Primal Dual Hybrid Gradient (PDHG) algorithm (Chambolle and Pock, 2011) for minimization of these objectives and show encouraging results on hyperspectral image data. We use the quadratic model of the NLTV objective in our experiments.
2.3. BASSNet architecture
BandAdaptive SpectralSpatial feature learning Network (BASSNet) (Santara et al., 2017) is a deep neural network architecture for endtoend supervised classification of Hyperspectral Images. Hyperspectral image classification poses two unique challenges: a) curse of dimensionality resulting from large number of spectral dimensions and scarcity of labelled training samples, and, b) large spatial variability of spectral signature of materials. The BASSNet architecture is extremely data efficient thanks to extensive parameter sharing along the spectral dimension and is capable of learning highly nonlinear functions from a relatively small number of labelled training examples. Also, it uses spatial context to account for spatial variability of spectral signatures. BASSNet shows stateoftheart supervised classification accuracy on benchmark HSI datasets like Indian Pines, Salinas and University of Pavia. Figure 2 shows a schematic diagram of the BASSNet architecture.
3. PU Classification of Hyperspectral Images
In this section we describe the proposed PU learning algorithms for HSI classification. As mentioned in the previous section, we use the BASSNet architecture of Santara et al. (Santara et al., 2017) as function approximators, whenever required, in our pipeline. The input to the network is a pixel from the image with its neighborhood (for spatial context) in the form of a volume, where is the number of channels in the input image. The output is the predicted class label for . The specific configuration of BASSNet that we use in our experiments is Configuration 4 of Table 1 of Santara et al. (Santara et al., 2017). As BASSNet was originally made for multiclass classification, it used a softmax layer at the output and optimized a multiclass categorical cross entropy loss defined as follows.
(7) 
Where is the total number of classes, is a binary indicator function which returns when class is the correct classification of observation and otherwise, and denotes the predicted probability of observation belonging to class . In PU learning, we work with binary classifiers. Hence we replace the softmax layer with a sigmoid layer and use the binary cross entropy loss function (equation 7 with ) for training.
Let denote the set of labeled positive data points and be the unlabeled data points such that . Let denote the prior probability of the positive class in the entire image.
In the first set of experiments, we implement the NNREPU learning algorithm of Kiryo et al. (Kiryo
et al., 2017) (Algorithm 1). As the true value of is unknown for an arbitrary HSI scene, the user (who is expected to have some domain knowledge) has to make an estimate of from visual inspection of the image. We study how the performance of the classifier varies with perturbations to the true value of .
In our second set of experiments, we evaluate PN classification based PU learning (PNPU). Algorithm 2 describes the workflow. A novel spectralspatial retrieval model, described in Section 3.2, is used to model the conditional positive class probability of an unlabelled pixel . Negative samples are drawn from . A PN classifier having the BASSNet architecture is then trained on the labelled positive and sampled negative examples.
3.1. Heuristicbased Probability Estimates
We explore two heuristics for modeling , the conditional positive class probability of an unlabelled pixel .
3.1.1. Spatial distance based
According to this heuristic, the conditional probability of an unlabelled pixel belonging to the positive class decreases with its Euclidean distance from the nearest labelled positive sample. Let be the Euclidean distance of from the nearest labelled positive pixel. Then we have,
(8) 
where baseline and temperature are hyperparameters.
This heuristic draws from the intuition of spatial continuity of a material.
The primary drawbacks of this heuristic are a) it assumes that the user labels positive pixels uniformly over all occurrences of the positive class in the scene, and b) it does not use any notion of spectral similarity of pixels. Imagine a scene in which the positive class occurs in several disconnected locations of the scene, far away from one another. If the user only labels pixels in one of these locations, the positive pixels from the other locations would have a high chance of being wrongly sampled as negative class by this heuristic – thus affecting the sensitivity of the classifier.
3.1.2. Spectral similarity based
We use an unsupervised segmentation algorithm (PDHG (Zhu et al., 2017a), in our experiments), to cluster the hyperspectral scene into a set of clusters based on spectral similarity. Suppose an unlabelled sample belongs to a cluster of size . If of the samples from its cluster were labelled positive by the user, then, the probability of the unlabelled pixel belonging to the positive class is given by:
(9) 
Where, is a small positive number ( in our experiments). This way, we sample more pixels of the negative class from regions of the scene that differ significantly in spectral characteristics from the labelled positive class. The main drawback of this heuristic stems from its assumption that each cluster containing positive pixels is likely to contain some userlabelled pixels. This is possible only under two conditions: a) the spectral uniformity of the positive class is high enough for the unsupervised segmentation algorithm to include all the positive pixels in the same cluster, or, b) the user labels the positive pixels uniformly over all the different regions of the HSI scene in which the positive class occurs. Additionally, under this model, every pixel in a cluster gets assigned the same probability of being sampled for the negative class regardless of its spatialproximity to the labelled positive samples. This can directly affect the specificity of the classifier.
3.2. Spectralspatial Retrieval Model
The spectral and spatial heuristics make certain assumptions about material distribution and the behavior of the annotator. Although these assumptions seldom hold completely, they do hold true to a certain degree in natural HSI scenes. Our goal is to design a retrieval model that outputs a lowerbound on the estimate of whenever one or more of these assumptions are violated in an image. A multiplicative combination of the spatial (equation 8) and spectral (equation 9) factors described in Section 3.1 achieves this goal and compensates for the drawbacks of the individual heuristics. The conditional probability of an unlabelled pixel belonging to the positive class under this retrieval model is given by:
(10) 
We evaluate our retrieval model on three benchmark HSI datasets and two annotation models that simulate the behavior of a human annotator.
4. Experimental Results
In this section we compare the performances of the methods proposed in Section 3.
4.1. Data Sets
We perform our experiments on three popular hyperspectral image classification data sets – Indian Pines (Baumgardner et al., 2015), Salinas, and Pavia University scene^{2}^{2}2http://www.ehu.eus/ccwintco/index.php?title=Hyperspectral_Remote_ Sensing_Scenes. Some classes in the Indian Pines data set have very few samples. We reject those classes and select the top classes by population for experimentation. The problem of insufficient samples is less severe for Salinas and U. Pavia and all the classes are taken into account. We choose Cornnotill (class ), Stubble (class ) and Asphalt (class ) as positive classes for Indian Pines, Salinas and U. Pavia datasets respectively. These materials appear in multiple disconnected patches in their corresponding scenes. This puts to test the ability of our methods to model the dramatic spatial variability of spectral signature in HSI scenes. Additionally, these materials form midsized classes in their corresponding datasets. Hence the results we obtain about them from our experiments are statistically significant. We sample of the pixels of the positive class using one of the annotation models described in Section 4.2 to construct the labelled positive set, . The rest of the pixels constitute the set of unlabelled samples, . For NNREPU experiments we uniformrandomly sample points from for use in training. In the PN classification experiments, an equal number of negative samples as the labelled positive set are drawn from where is given by equation 10. As different frequency channels have different dynamic ranges, their values are normalized to the range using the transformation defined in equation 11, where denotes the random variable corresponding to the pixel values of a given channel.
(11) 
4.2. Annotation Models
We explore two annotation models for constructing the labelled positive set, : a) uniform, and b) blob. Imagine that the positive class forms connected components in an HSI scene – where two pixels are considered connected if and only if they are adjacent to each other. The uniform annotation model samples uniformly from all the connected components. It models the case in which the user labels positive samples uniformly across all instances of spatial occurrence of the positive class. The blob annotation model, on the other hand, models the more practical case in which the user labels a small blob of positive pixels in one location of the image. We implement blob annotation model by starting at a random positive sample, adding it to and expanding by searching and adding the adjoining positive pixels in a breadth first fashion. The sampler never leaves a connected component of positive pixels until all the pixels have been included in . After that, it shifts to a random positive pixel in a different connected component and repeats the process until the requisite number of positive samples have been drawn.
Indian Pines  Salinas  U. Pavia  

Sensor  AVIRIS  AVIRIS  ROSIS 
Place  Northwestern Indiana  Salinas Valley California  Pavia, Northern Italy 
Frequency Band       
Spatial Resolution  
No. of Channels  220  224  103 
No. of Classes  16  16  9 
4.3. Evaluation Metrics
We evaluate our PU learning algorithms in terms of precision, recall, Fscore, and area under the receiver operating characteristics curve (AUC) (Powers, 2011).
4.4. Operating Point and Hyperparameter Selection
We choose the operating point of our classifiers to minimize the expected cost of misclassification of a point in the Receiver Operating Curve (ROC) space (Langdon, 2011) given by:
(12) 
and are coordinates of the ROC space, and and are the costs of a false positive and false negative respectively. We assume . The expectation is performed on a validation set consisting of of the unlabelled samples in the dataset. The solution for our operating point is the point on the ROC curve that lies on a line of slope closest to the northwest corner, , of the ROC plot. The temperature and baseline hyperparameters are also tuned on the validation set through a grid search and presented in Table 2. The number of clusters for PDHG algorithm is set to the number of classes in the respective datasets given in Table 1. All neural networks are trained for epochs or till overfitting sets in (validation loss starts increasing), whichever happens first.
Indian Pines  Salinas  U. Pavia  

Temperature ()  
Baseline () 
4.5. Implementation Platform
The algorithms have been implemented in Python using the Chainer deep learning library (Tokui et al., 2015) for highly optimized training of neural networks. As every execution of the proposed algorithms involve training of a neural network, the computational demand is high. Chainer has native support for multicore CPU and multiGPU parallelism. This makes it a natural choice for our application. As Chainer only supports input images with even number of channels, we append a new channel with all zeros to the Pavia University HSI scene. Our code is available opensource on GitHub^{3}^{3}3https://github.com/HSISeg.
Dataset  Metric  Retrieval Model  

Uniform Sampling  Blob Sampling  
NNREPU  PN  NNREPU  PN  
Indian Pines  AUC  
Precision  
Recall  
Fscore  
Salinas  AUC  
Precision  
Recall  
Fscore  
Pavia U  AUC  
Precision  
Recall  
Fscore 
Experiment Name  Training Data  Prediction on the Test Set  Test Set Confusion Map 
NNREPU on Indian Pines with uniform sampling retrieval model  
PNPU on Indian Pines with uniform sampling retrieval model  
NNREPU on Salinas with blob sampling retrieval model.  
PNPU on Pavia University with blob sampling retrieval model. 
4.6. Results and Discussion
Table 3 presents the numerical outcomes of our experiments. The NNREPU numbers correspond to the true values of for each dataset. Table 4 shows the visual outputs of some sample runs of the proposed algorithms. We make the following observations:

The performance of NNREPU is highly sensitive to the value of supplied by the user. Figure 5 shows the variation of precision and recall with the supplied value of for Indian Pines dataset with uniform sampling of the positive class. This is a major drawback of the NNREPU approach because it is hard – even for an expert – to provide an accurate estimate for the query material in an arbitrary HSI scene.

PNPU, with the right hyperparameter settings in the spectralspatial retrieval model, gives performance comparable with NNREPU – although there is no clear trend of supremacy of any one method across all the different HSI datasets and annotation models. Unlike NNREPU, PNPU does not depend on a usersupplied value for .

We observe that for PNPU classification with blobsampling, the neural network tends to overfit very fast causing the recall (and in some cases, also the precision) on the validation set to drop soon after the commencement of training. An obvious explanation of this phenomenon could be as follows. While high spatial variation of spectral signature is a distinctive feature of hyperspectral images, the positive samples annotated by the user happen to be localized in a small part of the image. These localized set of positive examples fail to capture the entire range of variability of the positive class in the HSI scene. This prevents the neural network from learning the right spatial invariances for the positive class causing the false negative rate shoot up. In addition to this, imperfect sampling of the negative class from also introduces some positive samples that are labelled negative in the training set. This further contributes to the false negative rate. In order to address this problem, we stop the training as soon as the recall on the validation set starts to drop.
5. Conclusion
This paper takes a novel approach to HSI classification by formulating it in the PU learning paradigm. The result is a framework that is material, device and platform agnostic and can perform large scale information retrieval in arbitrary HSI scenes. We propose two approaches to solve the HSI classification problem in this framework and preliminary results on benchmark HSI datasets show promising performance. A notable drawback of the proposed approaches is the fact that every execution of the algorithms requires retraining a neural network. This poses substantial computational burden. One possible way to ameliorate this is to pretrain a neural network for a related task and retrain only the last layer for PU learning. In traditional information retrieval iterative refinement of the retrieval model based on relevance feedback plays an important role in improving the quality of retrieval from a given dataset. We plan to explore these topics in future work.
Acknowledgements.
We are thankful to Zhu et al. (Zhu et al., 2017a) for sharing their implementation of PDHG clustering. We would also like to thank Kiryo et al (Kiryo et al., 2017) for sharing their code for the generic PU Learning framework. This study was performed as a part of the project titled ”Deep Learning for Automated Feature Discovery in Hyperspectral Images (LDH)” sponsored by Space Applications Centre (SAC), Indian Space Research Organization (ISRO). Anirban Santara’s work in this project was supported by Google India under the Google India Ph.D. Fellowship Award.References
 (1)
 Baumgardner et al. (2015) Marion F. Baumgardner, Larry L. Biehl, and David A. Landgrebe. 2015. 220 Band AVIRIS Hyperspectral Image Data Set: June 12, 1992 Indian Pine Test Site 3. (Sep 2015). https://doi.org/doi:/10.4231/R7RX991C
 Buchel and Ersoy (2018) Julian Buchel and Okan K. Ersoy. 2018. Ladder Networks for SemiSupervised Hyperspectral Image Classification. CoRR abs/1812.01222 (2018).
 CampsValls and Bruzzone (2005) G. CampsValls and L. Bruzzone. 2005. Kernelbased methods for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 43, 6 (2005), 1351–1362.
 CampsValls et al. (2007) Gustavo CampsValls, Tatyana V. Bandos Marsheva, and Dengyong Zhou. 2007. SemiSupervised GraphBased Hyperspectral Image Classification. IEEE Transactions on Geoscience and Remote Sensing 45 (2007), 3044–3054.
 CampsValls et al. (2013) Gustavo CampsValls, Devis Tuia, Lorenzo Bruzzone, and Jón Atli Benediktsson. 2013. Advances in Hyperspectral Image Classification: Earth monitoring with statistical learning methods. arXiv:1310.5107 [cs.CV] (2013).
 Chambolle and Pock (2011) Antonin Chambolle and Thomas Pock. 2011. A firstorder primaldual algorithm for convex problems with applications to imaging. Journal of mathematical imaging and vision 40, 1 (2011), 120–145.
 Cui et al. (2018) Binge Cui, Xiaoyun Xie, Siyuan Hao, Jiandi Cui, and Yan Lu. 2018. SemiSupervised Classification of Hyperspectral Images Based on Extended Label Propagation and Rolling Guidance Filtering. Remote Sensing 10 (2018), 515.
 Denis et al. (2000) François Denis, Rémi Gilleron, and Fabien Letouzey. 2000. Learning from positive and unlabeled examples. Theor. Comput. Sci. 348 (2000), 70–83.
 Du Plessis et al. (2015) Marthinus Du Plessis, Gang Niu, and Masashi Sugiyama. 2015. Convex formulation for learning from positive and unlabeled data. In International Conference on Machine Learning. 1386–1394.
 du Plessis et al. (2014) Marthinus C du Plessis, Gang Niu, and Masashi Sugiyama. 2014. Analysis of learning from positive and unlabeled data. In Advances in neural information processing systems. 703–711.
 Elkan and Noto (2008) Charles Elkan and Keith Noto. 2008. Learning classifiers from only positive and unlabeled data. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 213–220.
 F.Melgani and B.Lorenzo (2004) F.Melgani and B.Lorenzo. Aug. 2004. Classification of hyperspectral remote sensing images with support vector machines. IEEE Trans. Geosci. Remote Sens. 42, 8 (Aug. 2004), 1778–1790.
 Geography (2018) GIS Geography. 2018. Multispectral vs Hyperspectral Imagery Explained. (2018). https://gisgeography.com/multispectralvshyperspectralimageryexplained/
 Goodfellow et al. (2016) Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep learning. MIT press.
 Herold et al. (2004) M. Herold, Dar A. Roberts, Margaret E. Gardner, and Philip E. Dennison. 2004. Spectrometry for urban area remote sensing — Development and analysis of a spectral library from 350 to 2400 nm.
 Hou et al. (2018) Ming Hou, Brahim Chaibdraa, Chao Li, and Qibin Zhao. 2018. Generative Adversarial PositiveUnlabelled Learning. In IJCAI.
 Kiros et al. (2014) Ryan Kiros, Ruslan Salakhutdinov, and Richard S Zemel. 2014. Unifying visualsemantic embeddings with multimodal neural language models. arXiv preprint arXiv:1411.2539 (2014).
 Kiryo et al. (2017) Ryuichi Kiryo, Gang Niu, Marthinus C du Plessis, and Masashi Sugiyama. 2017. Positiveunlabeled learning with nonnegative risk estimator. In Advances in Neural Information Processing Systems. 1675–1685.
 Krizhevsky et al. (2012) Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. NIPS (2012).
 Landgrebe (2002) D. Landgrebe. 2002. Hyperspectral image data analysis. IEEE Signal Process. Mag. 19, 1 (2002), 17–28.
 Langdon (2011) W.B. Langdon. 2011. Receiver Operating Characteristics (ROC). (2011). http://www0.cs.ucl.ac.uk/staff/ucacbbl/roc/
 Lee and Liu (2003) Wee Sun Lee and Bing Liu. 2003. Learning with Positive and Unlabeled Examples Using Weighted Logistic Regression. In ICML.
 Li et al. (2016) Wei Li, Guodong Wu, Fan Zhang, and Qian Du. Nov. 2016. Hyperspectral Image Classification Using Deep PixelPair Features. IEEE Trans. Geosci. Remote Sens. PP, 99 (Nov. 2016), 1–10. https://doi.org/10.1109/TGRS.2016.2616355
 Lin et al. (2015) Kevin Lin, HueiFang Yang, JenHao Hsiao, and ChuSong Chen. 2015. Deep learning of binary hash codes for fast image retrieval. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 27–35.
 Liu et al. (2007) Ying Liu, Dengsheng Zhang, Guojun Lu, and WeiYing Ma. 2007. A survey of contentbased image retrieval with highlevel semantics. Pattern recognition 40, 1 (2007), 262–282.
 Mitra et al. (2017) Bhaskar Mitra, Fernando Diaz, and Nick Craswell. 2017. Learning to match using local and distributed representations of text for web search. In Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 1291–1299.
 Nigam et al. (1998) Kamal Nigam, Andrew McCallum, Sebastian Thrun, and Tom Michael Mitchell. 1998. Learning to Classify Text from Labeled and Unlabeled Documents. In AAAI/IAAI.
 Powers (2011) David Martin Powers. 2011. Evaluation: from precision, recall and Fmeasure to ROC, informedness, markedness and correlation. (2011).
 Rao et al. (2007) Nallani Venkata Rama Rao, P. K. Garg, and Sunil Kumar Ghosh. 2007. Development of an agricultural crops spectral library and classification of crops at cultivar level using hyperspectral data. Precision Agriculture 8 (2007), 173–185.
 Richards (2013) J.A. Richards. 2013. Remote Sensing Digital Image Analysis: An Introduction. Springer, New York, NY, USA.
 Santara et al. (2017) Anirban Santara, Kaustubh Mani, Pranoot Hatwar, Ankit Singh, Ankur Garg, Kirti Padia, and Pabitra Mitra. 2017. BASS Net: bandadaptive spectralspatial feature learning neural network for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing 55, 9 (2017), 5293–5301.
 Schmidhuber (2015) Jürgen Schmidhuber. 2015. Deep learning in neural networks: An overview. Neural networks 61 (2015), 85–117.
 Shepherd and Walsh ([n. d.]) Keith D. Shepherd and Markus G. Walsh. [n. d.]. Development of Reflectance Spectral Libraries for Characterization of Soil Properties.
 Smeulders et al. (2000) Arnold WM Smeulders, Marcel Worring, Simone Santini, Amarnath Gupta, and Ramesh Jain. 2000. Contentbased image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis & Machine Intelligence 12 (2000), 1349–1380.
 Tokui et al. (2015) Seiya Tokui, Kenta Oono, Shohei Hido, and Justin Clayton. 2015. Chainer: a nextgeneration open source framework for deep learning. In Proceedings of workshop on machine learning systems (LearningSys) in the twentyninth annual conference on neural information processing systems (NIPS), Vol. 5. 1–6.
 Van den Oord et al. (2013) Aaron Van den Oord, Sander Dieleman, and Benjamin Schrauwen. 2013. Deep contentbased music recommendation. In Advances in neural information processing systems. 2643–2651.
 Wan et al. (2014) Ji Wan, Dayong Wang, Steven Chu Hong Hoi, Pengcheng Wu, Jianke Zhu, Yongdong Zhang, and Jintao Li. 2014. Deep learning for contentbased image retrieval: A comprehensive study. In Proceedings of the 22nd ACM international conference on Multimedia. ACM, 157–166.
 Wang et al. (2016) Wei Wang, Xiaoyan Yang, Beng Chin Ooi, Dongxiang Zhang, and Yueting Zhuang. 2016. Effective deep learningbased multimodal retrieval. The VLDB Journal—The International Journal on Very Large Data Bases 25, 1 (2016), 79–101.
 Yoshitaka and Ichikawa (1999) Atsuo Yoshitaka and Tadao Ichikawa. 1999. A survey on contentbased retrieval for multimedia databases. IEEE Transactions on Knowledge and Data Engineering 11, 1 (1999), 81–93.
 Zhang and Zuo (2008) Bangzuo Zhang and Wanli Zuo. 2008. Learning from Positive and Unlabeled Examples: A Survey. 2008 International Symposiums on Information Processing (2008), 650–654.
 Zhu et al. (2017a) Wei Zhu, Victoria Chayes, Alexandre Tiard, Stephanie Sanchez, Devin Dahlberg, Andrea L Bertozzi, Stanley Osher, Dominique Zosso, and Da Kuang. 2017a. Unsupervised classification in hyperspectral imagery with nonlocal total variation and primaldual hybrid gradient algorithm. IEEE Transactions on Geoscience and Remote Sensing 55, 5 (2017), 2786–2798.
 Zhu et al. (2017b) Xiao Xiang Zhu, Devis Tuia, Lichao Mou, GuiSong Xia, Liangpei Zhang, Feng Xu, and Friedrich Fraundorfer. 2017b. Deep learning in remote sensing: A comprehensive review and list of resources. IEEE Geoscience and Remote Sensing Magazine 5, 4 (2017), 8–36.