Hyperspectral Data Augmentation
Data augmentation is a popular technique which helps improve generalization capabilities of deep neural networks. It plays a pivotal role in remote-sensing scenarios in which the amount of high-quality ground truth data is limited, and acquiring new examples is costly or impossible. This is a common problem in hyperspectral imaging, where manual annotation of image data is difficult, expensive, and prone to human bias. In this letter, we propose online data augmentation of hyperspectral data which is executed during the inference rather than before the training of deep networks. This is in contrast to all other state-of-the-art hyperspectral augmentation algorithms which increase the size (and representativeness) of training sets. Additionally, we introduce a new principal component analysis based augmentation. The experiments revealed that our data augmentation algorithms improve generalization of deep networks, work in real-time, and the online approach can be effectively combined with offline techniques to enhance the classification accuracy.
Hyperspectral satellite imaging (HSI) captures a wide spectrum (commonly more than a hundred of bands) of light per pixel (forming an array of reflectance values). Such detailed information is being exploited by the remote sensing, pattern recognition, and machine learning communities in the context of accurate HSI classification (elaborating a class label of an HSI pixel) and segmentation (finding the boundaries of objects) in many fields . Although the segmentation techniques include conventional machine learning algorithms (both unsupervised  and supervised [1, 3]), deep learning based techniques became the main stream [4, 5, 6, 7, 8, 9, 10, 11]. They encompass deep belief networks [4, 7], recurrent neural networks , and convolutional neural networks (CNNs) [5, 6, 9, 10, 11].
Deep neural networks discover the underlying data representation, hence they do not require feature engineering and can potentially capture features which are unknown for humans. However, to train such large-capacity learners (and to avoid overfitting), we need huge amount of ground truth data. Acquiring such training sets is often expensive, time-consuming, and human-dependent. These problems are very important real-world obstacles in training well-generalizing models (and validating such learners) faced by the remote sensing community—they are manifested by a very small number of ground truth benchmark HSI sets (there are approx. 10 widely-used benchmarks, with the Salinas Valley, Pavia University, and Indian Pines scenes being the most popular).
To combat the problem of limited, non-representative, and imbalanced training sets, data augmentation can be employed. It is a process of synthesizing new examples following the original data distribution. Since such enhanced training sets can improve generalization of the learners, data augmentation may be perceived as implicit regularization. In computer vision tasks, data augmentation often involves simple affine (e.g., rotation or scaling) and elastic transforms of the image data . These techniques, albeit applicable to HSI, do not benefit from all available information to model useful data.
I-a Related literature
The literature on HSI data augmentation is fairly limited (not to mention, only one of the deep-learning HSI segmentation methods discussed earlier in this section used augmentation—simple mirroring of training samples—for improved classification ). In , the authors calculated per-spectral-band standard deviation (for each class) in the training set. The augmented samples are later drawn from a zero-mean multivariate normal distribution , where is a diagonal matrix with the standard deviations (for all classes) along its main diagonal, and is a hyper-parameter (scale factor) of this technique. Albeit its simplicity, this augmentation was shown to be able to help improve generalization.
Li et al. utilized both spectral and spatial information to synthesize new samples in their pixel-block augmentation . Two data-generation approaches: (i) Gaussian smoothing filtering alongside (ii) label-based augmentation were exploited in . The latter technique resembles weak-labeling , and builds on an assumption that neighboring HSI pixels should share the same class (the label of a pixel propagates to its neighbors, and these generated examples are inserted into the training set). Thus, it may introduce wrongly-labeled samples.
Generative adversarial networks (GANs) have already attracted research attention in the context of data augmentation due to their ability of introducing invariance of models with respect to affine and appearance variations. GANs model an unknown data distribution based on the provided samples, and they are composed of a generator and discriminator. A generator should generate data samples which follow the underlying data distribution and are indistinguishable from the original data by the discriminator (hence, they compete with each other). In a recent work , Audebert et al. applied GAN conditioning to ensure that the synthesized HSI examples (from random distribution) belong to the specified class. Overall, all of the state-of-the-art HSI augmentation methods are aimed at increasing the size and representativeness of training sets which are later fed to train the deep learners.
In this letter, we propose a novel online augmentation technique for hyperspectral data (Section II-A)—instead of synthesizing samples and adding them to the training set (hence increasing its size which adversely affects the training time of deep learners), we generate new examples during the inference. These examples (both original and artificial) are classified using a deep net trained over the original set, and we apply the voting scheme to elaborate the final label. To our knowledge, such online augmentation has not been exploited in HSI analysis so far (test-time augmentation was utilized in medical imaging , where the authors applied affine transforms and noise injection into brain-tumor images for better segmentation). Also, we introduce principal component analysis (PCA) based augmentation (Section II-B) which may be used both offline (before the training) and online. This PCA-based augmentation simulates data variability, yet follows the original data distribution (which GANs are intended to learn , but they are not applicable at test-time).
Our rigorous experiments performed over HSI benchmarks revealed that the online approach is very flexible—different augmentation techniques can be exploited in this setting (Section III). The results obtained using a spectral CNN indicated that the test-time augmentation significantly improves abilities of the models when compared with those trained using the original sets, and augmented using other algorithms (also, we compared our CNN with a spectral-spatial CNN from the literature whose capacity is much larger ). Our online approach does not sacrifice the inference speed and allows for real-time classification. We showed that the proposed PCA augmentation is extremely fast, and ensures the highest-quality generalization of the deep models for all data-split scenarios. Finally, we demonstrated that the offline and online augmentations can be effectively combined for better classification.
Ii Proposed Hyperspectral Data Augmentation
Ii-a Online Hyperspectral Data Augmentation
Our online (test-time) data augmentation involves synthesizing artificial samples for each incoming example during the inference. We traverse the neighborhood of the original example and try to mitigate potential input-dependent uncertainty of the deep model. In contrast to the offline augmentation techniques, the test-time augmentation does not cause increasing the training time of the network, and it does not require defining the number of synthesized samples beforehand (also, for the majority of specific augmentation algorithms, the operation time of a trained learner would not be significantly affected since the inference is fast). We build upon the theory of ensemble learning, where elaborating a combined classifier (encompassing several weak learners) delivers high-quality generalization (it is an efficient regularizer). Here, by creating artificial data points, we implicitly form a homogeneous ensemble of deep models (trained over the same training set ). The final class label is elaborated using majority voting (with equal weights) over all () samples (for low ensemble confidence, when two or more classes receive the same number of votes, we perform soft voting—we average all class probabilities, and the final class label corresponds to the class with the highest average probability).
The proposed online HSI augmentation may be considered to be a meta-algorithm, in which a specific data augmentation method is applied to synthesize samples on the fly. Although we exploited the noise injection based approach , and our principal component analysis based technique (see Section II-B) in this work, we anticipate that other augmentations which are aimed at modifying an existent sample can be straightforwardly utilized here. Finally, the online augmentation may be coupled with any offline technique (Section III).
Ii-B Principal component analysis based data augmentation
In this work, we propose a new augmentation method based on PCA. Let us consider a training set of HSI pixels , where , and each is -dimensional ( denotes the number of bands in this HSI). PCA extracts a set of () projection directions (vectors) by maximizing the projected variance of a given -dimensional dataset—the first principal component () accounts for as much of the data variability as possible, and so forth. First, we center the data at the origin (hence, we subtract the average sample from each , and form the data matrix (of size), whose th column is (). The covariance matrix becomes , and it undergoes eigendecomposition , where is the matrix with the non-increasingly ordered eigenvalues along its main diagonal, and is the matrix with corresponding eigenvectors (principal components) as columns. Finally, principal components form an orthogonal base, and each sample can be projected onto a new feature space: . Importantly, each sample can be projected back to its original space: with the error (the PCA-training procedure minimizes this error—it is non-zero if ; otherwise, if , there is no reconstruction error).
The first step of our PCA-based data augmentation involves transforming all training samples using PCA (trained over ). Afterwards, the first principal component111However, more principal components could be exploited here. (of each sample) is multiplied by a random value drawn from a uniform distribution , where and are the hyper-parameters of our method ( is drawn independently for all original examples). This process is visualized in Fig. 1—we can observe that the synthesized examples (Fig. 1c) preserve the original data distribution (Fig. 1b) projected onto a reduced feature space, and preserve inter-class relationships. Finally, these samples are projected back onto the original space (using all principal components to ensure correct mapping), and they are added to the augmented (if executed offline). This PCA-based augmentation can be applied in both offline and online settings (in both cases, PCA is trained over the original ).
The experimental objective was to verify the impact of data augmentation on the deep model generalization. For online augmentation, we applied our PCA augmentation (PCA-ON), and noise injection (Noise-ON) , whereas for the offline setting, we used our PCA-based method (PCA), generative adversarial nets (GAN) , and noise injection (Noise) . GAN cannot be used online, since it does not modify an incoming example, but rather synthesizes samples which follow an approximated distribution. Finally, we combined online and offline augmentation in PCA/PCA-ON (PCA augmentation is used to both augment the set beforehand, and to generate new samples at test time), and GAN/PCA-ON. For each offline technique, we at most doubled the number of original samples (unless that number would exceed the most numerous class—in such case, we augment only by the missing difference). For online augmentation, we synthesize samples, and for PCA and PCA-ON, we had and .
We exploit our shallow (thus resource-frugal) 1D spectral CNN (Fig. 2) coded in Python 3.6 . Larger-capacity CNNs require longer training and infer slower, hence are less likely to be deployed for Earth observation, especially on board of a satellite. The training (ADAM, learn. rate of , , and ) stops, if after 15 epochs the validation set (random subset of ) accuracy plateaus.
We train and validate the deep models using: (1) balanced sets with random pixels (B), (2) imbalanced sets with random pixels (IB), and (3) our patched sets (P)  (for fair comparison, the numbers of pixels in and for B and IB are close to those reported in ). We also report the results obtained using a spectral-spatial CNN (3D-CNN) , trained over the original (3D-CNN—in contrast to our CNN—does suffer from the training-test information leak problem, and the 3D-CNN results over B and IB are over-optimistic ). For each fold in (3), we repeat the experiments , and for (1) and (2), we perform Monte-Carlo cross-validation with the same number of runs (e.g., if 5 folds are run , we execute Monte-Carlo runs for B and IB). We report per-class, average (AA), and overall accuracy (OA), averaged across all runs.
We focused on three HSI benchmarks (see their class-distribution characteristics at: http://www.ehu.eus/ccwintco/index.php/Hyperspectral_Remote_Sensing_Scenes): Salinas Valley, USA ( pixels, NASA AVIRIS sensor; , , ) presents different sorts of vegetation (16 classes, 224 bands, 3.7 m spatial resolution). Indian Pines (, AVIRIS; , , ) covers the North-Western Indiana, USA (agriculture and forest, 16 classes, 200 channels, 20 m). Pavia University (, ROSIS; , , ) was captured over Lombardy, Italy (urban scenery, 9 classes, 103 channels, 1.3 m).
The results (over the test sets ) obtained for the Salinas Valley, Indian Pines, and Pavia University datasets are gathered in Table I. Introducing augmented samples (in both online and offline settings) helped boost generalization abilities of the deep models in the majority of cases (even up to more than 8% of OA for GAN, PCA, and PCA/PCA-ON in Salinas, B; only the Noise offline augmentation deteriorated both OA and AA for P). Interestingly, exploring the local neighborhood randomly (Noise and Noise-ON) can notably deteriorate OA and AA. It usually occurs for under-represented classes (e.g., C7 in Pavia) since their examples lay close to other-class examples in the discovered feature space (therefore, they can be easily “confused” with each other). This problem is addressed by the data-distribution analysis in our PCA-based augmentations. Coupling offline and online augmentation (PCA/PCA-ON and GAN/PCA-ON) gave consistent high-quality results over all sets and all training-test splits, and dealt well with the HSI imbalance (in P, we did not ensure that examples of all classes are included in the original , thus P is very challenging ).
|(a) Without, (b) Noise, (c) GAN, (d) PCA, (e) Noise-ON, (f) PCA-ON,|
|(g) PCA/PCA-ON, (h) GAN/PCA-ON.|
To verify the statistical significance of the results (and see if the differences in the average per-class accuracy are important in the statistical sense), we executed two-tailed Wilcoxon’s tests for each dataset split (B, IB, and P) over per-class AA for all HSI. The results reported in Table II show that applying HSI data augmentation is beneficial in most cases and delivers significant improvements in accuracy. GAN did equally well as e.g., PCA, PCA-ON, and our combined PCA/PCA-ON and GAN/PCA-ON for B, as Noise-ON, PCA-ON, and GAN/PCA-ON for IB, and as Noise-ON and PCA-ON for P. It indicates that employing time-consuming and complex deep-learning engines for data augmentation not necessarily brings larger improvements in the performance of the deep models.
Our combined approaches (PCA/PCA-ON and GAN/PCA-ON) were stable and consistently ensured high-quality generalization (as shown in Table I) of the deep models over all splits. This stability is also manifested in Table III, where we summarize the results across all sets (although PCA gave the best accuracy for B, the differences between PCA and PCA/PCA-ON and GAN/PCA-ON are not statistically significant). We can appreciate that our PCA-based augmentation (offline, online, or combined) allowed us to obtain the best generalization—very intuitive PCA-based data-distribution analysis for synthesizing samples outperformed or worked on par with GAN in the case of difficult (small and imbalanced) sets. Finally, our CNN surpassed the accuracy elaborated using a significantly larger 3D-CNN from the literature (with a bigger capacity) for P (note that the results obtained using 3D-CNN for B and IB are over-optimistic due to the intrinsic training-test information leak problem, hence they cannot be considered reliable ).
|*Over-optimistic due to training-test information leak; see .|
To gain better insights into the augmentation performance (and its potential overhead imposed on the deep models in terms of training and/or test times), we collected the average execution times of the most important steps of the investigated methods in Table IV. It can be observed that training of GANs is very time-consuming (it was executed using NVIDIA GeForce GTX 1060), and is of orders of magnitude higher than pre-processing in other offline techniques (PCA and Noise). Although all offline augmentations affect the training time of deep networks, these differences are not dramatic. Finally, the online augmentation allowed us to classify test pixels in real-time (note that we report the inference time in ms).
|Sa—Salinas Valley, IP—Indian Pines, PU—Pavia University.|
In this letter, we introduced a new online HSI data augmentation approach which synthesizes examples at test time. It is in contrast to other state-of-the-art hyperspectral data augmentation techniques that work offline (i.e., before the deep-network training to increase the training set cardinality and representativeness). Our experimental study, performed over three HSI benchmark sets (with different training-test data splits) and coupled with statistical tests revealed that our online augmentation is very flexible (different augmentations can be applied here), improves the generalization abilities of deep neural networks, and works in real-time. Also, we showed that combining online and offline augmentation leads to consistently well-performing models. Finally, we proposed a principal component analysis based augmentation which operates extremely fast, synthesizes high-quality data, outperforms other augmentations for small and imbalanced sets, and is applicable in online and offline settings.
-  T. Dundar and T. Ince, “Sparse representation-based hyperspectral image classification using multiscale superpixels and guided filter,” IEEE GRSL, pp. 1–5, 2018.
-  G. Bilgin, S. Erturk, and T. Yildirim, “Segmentation of hyperspectral images via subtractive clustering and cluster validation using one-class SVMs,” IEEE TGRS, vol. 49, no. 8, pp. 2936–2944, 2011.
-  F. Li, D. Clausi, L. Xu et al., “ST-IRGS: A region-based self-training algorithm applied to hyperspectral image classification and segmentation,” IEEE TGRS, vol. 56, no. 1, pp. 3–16, 2018.
-  Y. Chen, X. Zhao, and X. Jia, “Spectralâspatial classification of hyperspectral data based on deep belief network,” IEEE J-STARS, vol. 8, no. 6, pp. 2381–2392, 2015.
-  W. Zhao and S. Du, “Spectral-spatial feature extraction for hyperspectral image classification,” IEEE TGRS, vol. 54, no. 8, pp. 4544–4554, 2016.
-  Y. Chen, H. Jiang, C. Li et al., “Deep feature extraction and classification of hyperspectral images based on convolutional neural networks,” IEEE TGRS, vol. 54, no. 10, pp. 6232–6251, 2016.
-  P. Zhong, Z. Gong, S. Li et al., “Learning to diversify deep belief networks for hyperspectral image classification,” IEEE TGRS, vol. 55, no. 6, pp. 3516–3530, 2017.
-  L. Mou, P. Ghamisi, and X. X. Zhu, “Deep recurrent nets for hyperspectral classification,” IEEE TGRS, vol. 55, no. 7, pp. 3639–3655, 2017.
-  A. Santara, K. Mani, P. Hatwar et al., “BASS Net: Band-adaptive spectral-spatial feature learning neural network for hyperspectral image classification,” IEEE TGRS, vol. 55, no. 9, pp. 5293–5301, 2017.
-  H. Lee and H. Kwon, “Going deeper with contextual CNN for hyperspectral classification,” IEEE TIP, vol. 26, no. 10, pp. 4843–4855, 2017.
-  Q. Gao, S. Lim, and X. Jia, “Hyperspectral image classification using convolutional neural networks and multiple feature learning,” Rem. Sens., vol. 10, no. 2, p. 299, 2018.
-  A. Krizhevsky, I. Sutskever, and G. Hinton, “Imagenet classification with deep convolutional neural networks,” Proc. NIPS, p. 1097â1105, 2012.
-  V. Slavkovikj, S. Verstockt, W. De Neve et al., “Hyperspectral image classification with CNNs,” in Proc. ICM. ACM, 2015, pp. 1159–1162.
-  W. Li, C. Chen, M. Zhang et al., “Data augmentation for hyperspectral classification with deep CNN,” IEEE GRSL, pp. 1–5, 2018.
-  J. Acquarelli, E. Marchiori, L. M. Buydens et al., “Spectral-spatial classification of hyperspectral images,” Rem. Sens., vol. 10, no. 7, 2018.
-  Y.-Y. Sun, Y. Zhang, and Z.-H. Zhou, “Multi-label learning with weak label,” in Proc. AAAI. AAAI Press, 2010, pp. 593–598.
-  N. Audebert, B. L. Saux, and S. Lefèvre, “Generative adversarial networks for realistic synthesis of hyperspectral samples,” CoRR, vol. abs/1806.02583, pp. 1–4, 2018.
-  G. Wang, W. Li, S. Ourselin et al., “Automatic brain tumor segmentation using convolutional neural networks with test-time augmentation,” CoRR, vol. abs/1810.07884, pp. 1–12, 2018.
-  J. Nalepa, M. Myller, and M. Kawulok, “Validating hyperspectral image segmentation,” IEEE GRSL, pp. 1–5, 2019, in press, DOI:10.1109/LGRS.2019.2895697 (pre-print: https://arxiv.org/abs/1811.03707).