Sparse-GAN: Sparsity-constrained Generative Adversarial Network for Anomaly Detection in Retinal OCT Image
Abstract
With the development of convolutional neural network, deep learning has shown its success for retinal disease detection from optical coherence tomography (OCT) images. However, deep learning often relies on large scale labelled data for training, which is oftentimes challenging especially for disease with low occurrence. Moreover, a deep learning system trained from data-set with one or a few diseases is unable to detect other unseen diseases, which limits the practical usage of the system in disease screening. To address the limitation, we propose a novel anomaly detection framework termed Sparsity-constrained Generative Adversarial Network (Sparse-GAN) for disease screening where only healthy data are available in the training set. The contributions of Sparse-GAN are two-folds: 1) The proposed Sparse-GAN predicts the anomalies in latent space rather than image-level; 2) Sparse-GAN is constrained by a novel Sparsity Regularization Net. Furthermore, in light of the role of lesions for disease screening, we present to leverage on an anomaly activation map to show the heatmap of lesions. We evaluate our proposed Sparse-GAN on a publicly available dataset, and the results show that the proposed method outperforms the state-of-the-art methods.
Kang Zhou
School of Information Science and Technology, ShanghaiTech University
Cixi Institute of Biomedical Engineering, Chinese Academy of Sciences
UBTech Research
Southern University of Science and Technology
Inception Institute of Artificial Intelligence
Anomaly Detection, Sparsity-constrained Network, Latent Feature, Adversarial Learning
1 Introduction
Over 300 million people worldwide are affected by various ocular diseases [2], such as diabetic retinopathy (DR) [17], age-related macular degeneration (AMD) and glaucoma. Among the many diagnostic methods, optical coherence tomography (OCT) is a non-invasive imaging modality that provides micrometer-resolution volumetric scans of the retina [5]. With the development of convolutional neural networks (CNNs) in computer vision [8, 10], many deep learning based approaches have been proposed to detect lesions in retinal OCT images [9] and fundus images [20, 4]. However, these deep learning based methods rely heavily on big data for training, which limits the application of deep learning to medical image analysis.

Different from that in the general computer vision, it is often challenging to get sufficient data for medical images due to several reasons. The first reason is that most of the medical data is not publicly available due to privacy concerns. The second reason is that labeling medical images often costs much time, while experienced clinicians are short of time for such tedious demarcation tasks. The third reason is that the occurrence of some lesions is usually low, while the presence of specific lesions is not known before the diagnosis. Therefore, the cost of obtaining large-scale medical data with particular types of lesions is often expensive and time-consuming.
Although it is difficult to get a large amount of data with different lesions, it is often much easier to get data from healthy subjects. In OCT imaging, one 3D scan from a healthy subject could provide hundreds of B-scan images without lesions. Considering the lesions as anomaly added to the images from healthy subjects, it is possible to train an anomaly detection system only using OCT B-scans without lesions.

Previous work has shown the effectiveness of anomaly detection for disease diagnosis [16] and lesion location [15]. Recently, CNNs based methods have been proposed to detect anomalies in medical images. Schlegl et al. [13] initially introduce a deep convolutional Generative Adversarial Network (GAN) [3], which is referred to a AnoGAN, to detect anomalies in OCT B-scans. Later, they further propose a f-AnoGAN [14], which is faster than AnoGAN. However, these networks are not trained in an end-to-end fashion, which may tend to get stuck into local optima. It is desirable to customize a network that learns the optimal features for anomaly detection.
In this paper, inspired by Image-to-Image GAN [6], whose generator is end-to-end optimized, we propose to employ Image-to-Image GAN for medical image anomaly detection. Then, to alleviate the effect of image noise (e.g. speckle noise in OCT images), we propose to map the reconstructed image into latent space with an additional encoder. Furthermore, motivated by the capability of interpretable sparse coding for anomaly detection, we propose to regularize the sparsity of latent features. By taking these factors into consideration, we present a novel framework: Sparsity-constrained Generative Adversarial Network (Sparse-GAN) for image anomaly detection with merely normal training data. The rationale behind the work is that the normal patterns from healthy subjects can be reconstructed with small errors while the patterns with lesions from diseased subjects are often reconstructed with large errors, as shown in Fig. 1.
The main contributions of this work are summarized as follows: (1) We propose to map the images into a latent space and regularize the latent feature with a novel sparsity regularizer; (2) We introduce a novel Sparse-GAN for anomaly detection, and our method is designed for the scenario where only data corresponding to healthy subjects are available in the training set. Thus our solution may ease the difficulty in data collection and annotation; (3) Our method also predicts anomaly activation maps to show lesions for clinical diagnosis.
2 Method
In this work, we mainly focus on regularizing the sparsity of latent feature and utilizing the latent feature to predict anomalies in GAN based anomaly detection framework. As shown in Fig. 2, the proposed Sparsity-constrained Generative Adversarial Network consists of three modules: 1) Image-to-Image GAN [6] for medical anomaly detection whose generator is end-to-end optimized. 2) Anomalies computing in latent space [1], to alleviate the effect of image noise (e.g. speckle noise in OCT images). 3) The novel Sparsity Regularization Net to regularize the sparsity of latent features.
2.1 Image-to-Image GAN for Anomaly Detection
As discussed earlier, we adopt the image-to-image [6] generator as the in the GAN, which consists of encoder and decoder , while denotes the discriminator. Let be input images, their latent feature are converted from input images , then the latent feature are transformed into reconstructed images . Image-to-Image GAN [6] is optimized with a reconstruction loss comprised of an adversarial loss,
(1) |
where and are regularization parameters. The adversarial loss and reconstruction loss are defined as,
(2) |
(3) |
where is the batch-size.
2.2 Predict Anomaly Score in Latent Space
One challenge in reconstructing the OCT images is the speckle noise. To reduce the influence of speckle noise, we propose to transform the reconstruction image into latent space by encoder , i.e. . To cut down computational cost, encoder share the same values with . In latent space, the model predicts anomaly score and diagnosis results as follows:
(4) |
(5) |
where is the anomaly score threshold determined on the validation set.
2.3 Sparse Regularization on Latent Feature
On the one hand, without additional regularization, generator may learn an approximation to the identity function, which can not distinguish disease images from normal images. On the other hand, sparse coding is interpretable and have the capability for anomaly detection [12, 11].
Based on this observation, we propose a novel Sparsity Regularization Net which recast the solution of sparse coding as a novel convolutional long short term memory unit (LSTM). Moreover, we regularize the sparsity of latent feature with the proposed Sparsity Regularization Net (i.e., ) as shown in Fig. 2. Letting denote Sparsity Regularization Net, we propose a novel Sparsity-constrained GAN (Sparse-GAN) with sparsity regularization .
The proposed Sparsity Regularization Net is inspired from Sparse LSTM [19]. However, sparsity reguliarzaiton net is different from sparse LSTM in two aspects. Firstly we apply the convolutional operation to replace element-wise multiplication in Sparse LSTM since the convolutional operation accelerates the computation. Secondly the input of the Sparse Constrained Net is the latent feature rather than the original image.
The loss to train Sparsity Regularization Net is defined as follows,
(6) |
where is the sparse code w.r.t. and is the dictionary.
Overall, the final loss of Sparse-GAN is given as the following:
(7) |
where are regularization parameters.
2.4 Anomaly Activation Map for Visualization
Since anomaly detection is significantly different from supervised classification, Class Activation Map (CAM) [18] is not suitable in our framework to show the role of lesions for diagnosis. To address the weakness of CAM, we propose Anomaly Activation Map (AAM) to visualize lesions in anomaly detection framework. We firstly perform Global Average Pooling () for latent feature and . Then we obtain the anomaly vector as follows,
(8) |
where , is the number of the channels of the latent feature. Finally, we multiply the feature map by anomaly vector in channel-wise fashion and get the anomaly activation map.
3 Experiments
3.1 Datasets and Evaluation Metrics
Datasets
We employ a publicly available dataset [7] to evaluate the performance of our Sparse-GAN. The whole dataset was from Spectralis OCT (Heidelberg Engineering, German), and contains data with three different lesions: drusen, DME (diabetic macular edema), and CNV (choroidal neovascularization). The detailed description about this dataset could be found in [7]. To train the proposed Sparse-GAN and determine the threshold of anomaly score, we divide original training set into two parts: new training set with 50,140 normal images, validation set consists of 3000 disease images and 1000 normal images. The testing set is the same as the original dataset.
Evaluation Metrics
For a given test image , we use given in Eq. (4) to compute the anomaly score. Further, we use given in Eq. (5) for diagnosis. Based on the anomaly score, we mainly use AUC (Area under the ROC Curve) to evaluate our method. To compute accuracy (Acc), we need to determine the threshold of anomaly score on the validation set, which includes 75% disease images and 25% normal images. We adopt sensitivity (Sen) as the third evaluation metric. Finally, the threshold is then used for testing.
3.2 Training Details
The proposed Sparse-GAN is implemented in PyTorch with NVIDIA graphics processing units (GeForce TITAN V). The input image size is , while the batch size is 32. The optimizer is Adam and the learning rate is 0.001. Empirically, we let , and .
3.3 Quantitative Experimental Results
Method | Val-set | Test-set | ||
---|---|---|---|---|
AUC | AUC | Acc | Sen | |
Auto-Encoder | 0.729 | 0.783 | 0.751 | 0.834 |
AnoGAN[13] | 0.815 | 0.846 | 0.789 | 0.917 |
f-AnoGAN[14] | 0.849 | 0.882 | 0.808 | 0.871 |
pix2pix [6] #1 | 0.805 | 0.861 | 0.818 | 0.879 |
pix2pix [6] #2 | 0.837 | 0.874 | 0.815 | 0.900 |
Sparse-GAN | 0.885 | 0.925 | 0.841 | 0.951 |
#1, image level
#2, latent space
Ablation Study.
To justify the benefits of the anomaly score in latent space and the sparsity regulirization nets, we conduct the following ablation studies, we conduct some ablation studies: #1 denotes Image-to-Image GAN [6] predicting anomaly score in image-level, and #2 denotes Image-to-Image GAN [6] predicting anomaly score in latent feature.
By including loss based on Auto-Encoder, we improve the AUC result from 0.729 to 0.805 on the validation set. That is to say, adversarial learning is helpful. By transforming the reconstruction image into latent space, the result is improved from 0.805 to 0.837 on the validation set since the noise in images is harmful to diagnosis. Finally, by regularizing the latent features with our proposed Sparsity Regularization Net, the result is improved from 0.837 to 0.885, which means the sparsity regularization is effective. On the test set, the ablation studies validate the effectiveness of different modules too. Table 1 summarized the results.
Performance Comparison.
We further compare the proposed method with state-of-the-art networks, inlcuding Auto-Encoder, AnoGAN [13] and f-AnoGAN [14].
By comparing our adopted Image-to-Image GAN (i.e. 1) with primary AnoGAN [13], we improve the AUC result from 0.846 to 0.861 on the test set. That is to say, the end-to-end optimized generator is better than two stage trained generator. Compared with these methods, we get the highest AUC than others on both the validation set and test set. The accuracy of our method on the test set is comparable to supervised deep learning methods, and the sensitivity denotes missed diagnosis of our model is very low, which is more meaningful for clinicians. The results are also summarized in Table 1 .
3.4 Qualitative Analysis with Anomaly Activation Map
To further understand what the role of the lesion is for disease clinical diagnosis, some example images are shown in Fig 3. When Sparse-GAN classifies a given image as abnormal, AAM will be computed. In addition to the anomaly heatmap, we also show the output images and difference between the input image and output one. Since Sparse-GAN is only trained on the normal set, the model could not reconstruct abnormal patterns. Diff images show that noise in images is harmful to reconstruction. The heatmap can localize the lesion in general and this validates the effectiveness of our proposed AAM for anomaly detection framwork.

4 Conclusion
In this work, we propose a novel Sparse-GAN for anomaly detection, which detects anomalies in latent space and the feature in latent space is constrained by a novel Sparsity Regularizer Net. The quantitative experimental results on a public dataset validate the feasibility of anomaly detection for OCT images and also validate the effectiveness of our method. Further, we also show the anomaly activation maps of the lesion to make our results more explainable.
5 Acknowledge
The project is partially supported by ShanghaiTech-Megavii Joint Lab, in part by the National Natural Science Foundation of China (NSFC) under Grants No. 61932020, and supported by the ShanghaiTech-UnitedImaging Joint Lab, Ningbo â2025 S&T Megaprojectsâ and Ningbo 3315 Innovation team grant. We also acknowledge the contribution of Weixin Luo and Wen Liu for their insightful comments with regard to the reconstruction-based anomaly detection method.
Footnotes
- thanks: {zhoukang, gaoshh}@shanghaitech.edu.cn, chengjun@nimte.ac.cn
- thanks: corresponding authors
References
- (2018) Ganomaly: semi-supervised anomaly detection via adversarial training. In Asian Conference on Computer Vision, pp. 622–637. Cited by: §2.
- (2017) Pathological oct retinal layer segmentation using branch residual u-shape networks. In MICCAI, pp. 294–301. Cited by: §1.
- (2014) Generative adversarial nets. In Advances in neural information processing systems, pp. 2672–2680. Cited by: §1.
- (2019) CE-net: context encoder network for 2d medical image segmentation. IEEE transactions on medical imaging. Cited by: §1.
- (1991) Optical coherence tomography. Science 254 (5035), pp. 1178–1181. Cited by: §1.
- (2017) Image-to-image translation with conditional adversarial networks. In CVPR, pp. 1125–1134. Cited by: §1, §2.1, §2, §3.3.1, Table 1.
- (2018) Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell 172 (5), pp. 1122–1131. Cited by: §3.1.1.
- (2012) Imagenet classification with deep convolutional neural networks. In NeurIPS, pp. 1097–1105. Cited by: §1.
- (2017) Deep-learning based, automated segmentation of macular edema in optical coherence tomography. Biomedical optics express 8 (7), pp. 3440–3448. Cited by: §1.
- (2018) Multiview multitask gaze estimation with deep convolutional neural networks. IEEE transactions on neural networks and learning systems. Cited by: §1.
- (2019) Video anomaly detection with sparse coding inspired deep neural networks. IEEE Transactions on Pattern Analysis and Machine Intelligence. Cited by: §2.3.
- (2017) A revisit of sparse coding based anomaly detection in stacked rnn framework. In ICCV, pp. 341–349. Cited by: §2.3.
- (2017) Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. In IPMI, pp. 146–157. Cited by: §1, §3.3.2, §3.3.2, Table 1.
- (2019) F-anogan: fast unsupervised anomaly detection with generative adversarial networks. Medical Image Analysis. Cited by: §1, §3.3.2, Table 1.
- (2018) Unsupervised identification of disease marker candidates in retinal oct imaging data. IEEE TMI. Cited by: §1.
- (2017) An anomaly detection approach for the identification of dme patients using spectral domain optical coherence tomography images. Computer methods and programs in biomedicine 139, pp. 109–117. Cited by: §1.
- (2018) Uniqueness-driven saliency analysis for automated lesion detection with applications to retinal diseases. In MICCAI, pp. 109–118. Cited by: §1.
- (2016) Learning deep features for discriminative localization. In CVPR, pp. 2921–2929. Cited by: §2.4.
- (2018) SC2Net: sparse lstms for sparse coding. In AAAI, Cited by: §2.3.
- (2018) Multi-cell multi-task convolutional neural networks for diabetic retinopathy grading. In 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pp. 2724–2727. Cited by: §1.
