Sparse-GAN: Sparsity-constrained Generative Adversarial Network for Anomaly Detection in Retinal OCT Image

Sparse-GAN: Sparsity-constrained Generative Adversarial Network for Anomaly Detection in Retinal OCT Image


With the development of convolutional neural network, deep learning has shown its success for retinal disease detection from optical coherence tomography (OCT) images. However, deep learning often relies on large scale labelled data for training, which is oftentimes challenging especially for disease with low occurrence. Moreover, a deep learning system trained from data-set with one or a few diseases is unable to detect other unseen diseases, which limits the practical usage of the system in disease screening. To address the limitation, we propose a novel anomaly detection framework termed Sparsity-constrained Generative Adversarial Network (Sparse-GAN) for disease screening where only healthy data are available in the training set. The contributions of Sparse-GAN are two-folds: 1) The proposed Sparse-GAN predicts the anomalies in latent space rather than image-level; 2) Sparse-GAN is constrained by a novel Sparsity Regularization Net. Furthermore, in light of the role of lesions for disease screening, we present to leverage on an anomaly activation map to show the heatmap of lesions. We evaluate our proposed Sparse-GAN on a publicly available dataset, and the results show that the proposed method outperforms the state-of-the-art methods.


Kang Zhou1, Shenghua Gao, Jun Cheng2, Zaiwang Gu, Huazhu Fu, Zhi Tu \address Jianlong Yang, Yitian Zhao, Jiang Liu
School of Information Science and Technology, ShanghaiTech University
Cixi Institute of Biomedical Engineering, Chinese Academy of Sciences
UBTech Research
Southern University of Science and Technology
Inception Institute of Artificial Intelligence


Anomaly Detection, Sparsity-constrained Network, Latent Feature, Adversarial Learning

1 Introduction

Over 300 million people worldwide are affected by various ocular diseases [2], such as diabetic retinopathy (DR) [17], age-related macular degeneration (AMD) and glaucoma. Among the many diagnostic methods, optical coherence tomography (OCT) is a non-invasive imaging modality that provides micrometer-resolution volumetric scans of the retina [5]. With the development of convolutional neural networks (CNNs) in computer vision [8, 10], many deep learning based approaches have been proposed to detect lesions in retinal OCT images [9] and fundus images [20, 4]. However, these deep learning based methods rely heavily on big data for training, which limits the application of deep learning to medical image analysis.

Figure 1: The input image and its reconstructed image. (a) Normal input. (b) Reconstruction of the normal input. (c) Disease input. (d) Reconstruction of the disease input with our proposed method. Since lesions can’t be reconstructed, the reconstruction error is high to be recognized as abnormal.

Different from that in the general computer vision, it is often challenging to get sufficient data for medical images due to several reasons. The first reason is that most of the medical data is not publicly available due to privacy concerns. The second reason is that labeling medical images often costs much time, while experienced clinicians are short of time for such tedious demarcation tasks. The third reason is that the occurrence of some lesions is usually low, while the presence of specific lesions is not known before the diagnosis. Therefore, the cost of obtaining large-scale medical data with particular types of lesions is often expensive and time-consuming.

Although it is difficult to get a large amount of data with different lesions, it is often much easier to get data from healthy subjects. In OCT imaging, one 3D scan from a healthy subject could provide hundreds of B-scan images without lesions. Considering the lesions as anomaly added to the images from healthy subjects, it is possible to train an anomaly detection system only using OCT B-scans without lesions.

Figure 2: The overall architecture of our Sparse-GAN. Components with boxes with solid line are networks while other boxes are features. In the testing stage, given a test image , firstly the image is converted into latent feature with , while is converted into reconstructed image with . Then is transformed to latent feature with another encoder , finally the framework predicts anomaly score with . In the training stage, besides the same pipeline of testing, the framework is trained with image reconstruction loss , adversarial loss and sparsity regularization . (Best viewed with colors.)

Previous work has shown the effectiveness of anomaly detection for disease diagnosis [16] and lesion location [15]. Recently, CNNs based methods have been proposed to detect anomalies in medical images. Schlegl et al. [13] initially introduce a deep convolutional Generative Adversarial Network (GAN) [3], which is referred to a AnoGAN, to detect anomalies in OCT B-scans. Later, they further propose a f-AnoGAN [14], which is faster than AnoGAN. However, these networks are not trained in an end-to-end fashion, which may tend to get stuck into local optima. It is desirable to customize a network that learns the optimal features for anomaly detection.

In this paper, inspired by Image-to-Image GAN [6], whose generator is end-to-end optimized, we propose to employ Image-to-Image GAN for medical image anomaly detection. Then, to alleviate the effect of image noise (e.g. speckle noise in OCT images), we propose to map the reconstructed image into latent space with an additional encoder. Furthermore, motivated by the capability of interpretable sparse coding for anomaly detection, we propose to regularize the sparsity of latent features. By taking these factors into consideration, we present a novel framework: Sparsity-constrained Generative Adversarial Network (Sparse-GAN) for image anomaly detection with merely normal training data. The rationale behind the work is that the normal patterns from healthy subjects can be reconstructed with small errors while the patterns with lesions from diseased subjects are often reconstructed with large errors, as shown in Fig. 1.

The main contributions of this work are summarized as follows: (1) We propose to map the images into a latent space and regularize the latent feature with a novel sparsity regularizer; (2) We introduce a novel Sparse-GAN for anomaly detection, and our method is designed for the scenario where only data corresponding to healthy subjects are available in the training set. Thus our solution may ease the difficulty in data collection and annotation; (3) Our method also predicts anomaly activation maps to show lesions for clinical diagnosis.

2 Method

In this work, we mainly focus on regularizing the sparsity of latent feature and utilizing the latent feature to predict anomalies in GAN based anomaly detection framework. As shown in Fig. 2, the proposed Sparsity-constrained Generative Adversarial Network consists of three modules: 1) Image-to-Image GAN [6] for medical anomaly detection whose generator is end-to-end optimized. 2) Anomalies computing in latent space [1], to alleviate the effect of image noise (e.g. speckle noise in OCT images). 3) The novel Sparsity Regularization Net to regularize the sparsity of latent features.

2.1 Image-to-Image GAN for Anomaly Detection

As discussed earlier, we adopt the image-to-image [6] generator as the in the GAN, which consists of encoder and decoder , while denotes the discriminator. Let be input images, their latent feature are converted from input images , then the latent feature are transformed into reconstructed images . Image-to-Image GAN [6] is optimized with a reconstruction loss comprised of an adversarial loss,


where and are regularization parameters. The adversarial loss and reconstruction loss are defined as,


where is the batch-size.

2.2 Predict Anomaly Score in Latent Space

One challenge in reconstructing the OCT images is the speckle noise. To reduce the influence of speckle noise, we propose to transform the reconstruction image into latent space by encoder , i.e. . To cut down computational cost, encoder share the same values with . In latent space, the model predicts anomaly score and diagnosis results as follows:


where is the anomaly score threshold determined on the validation set.

2.3 Sparse Regularization on Latent Feature

On the one hand, without additional regularization, generator may learn an approximation to the identity function, which can not distinguish disease images from normal images. On the other hand, sparse coding is interpretable and have the capability for anomaly detection [12, 11].

Based on this observation, we propose a novel Sparsity Regularization Net which recast the solution of sparse coding as a novel convolutional long short term memory unit (LSTM). Moreover, we regularize the sparsity of latent feature with the proposed Sparsity Regularization Net (i.e., ) as shown in Fig. 2. Letting denote Sparsity Regularization Net, we propose a novel Sparsity-constrained GAN (Sparse-GAN) with sparsity regularization .

The proposed Sparsity Regularization Net is inspired from Sparse LSTM [19]. However, sparsity reguliarzaiton net is different from sparse LSTM in two aspects. Firstly we apply the convolutional operation to replace element-wise multiplication in Sparse LSTM since the convolutional operation accelerates the computation. Secondly the input of the Sparse Constrained Net is the latent feature rather than the original image.

The loss to train Sparsity Regularization Net is defined as follows,


where is the sparse code w.r.t. and is the dictionary.

Overall, the final loss of Sparse-GAN is given as the following:


where are regularization parameters.

2.4 Anomaly Activation Map for Visualization

Since anomaly detection is significantly different from supervised classification, Class Activation Map (CAM) [18] is not suitable in our framework to show the role of lesions for diagnosis. To address the weakness of CAM, we propose Anomaly Activation Map (AAM) to visualize lesions in anomaly detection framework. We firstly perform Global Average Pooling () for latent feature and . Then we obtain the anomaly vector as follows,


where , is the number of the channels of the latent feature. Finally, we multiply the feature map by anomaly vector in channel-wise fashion and get the anomaly activation map.

3 Experiments

3.1 Datasets and Evaluation Metrics


We employ a publicly available dataset [7] to evaluate the performance of our Sparse-GAN. The whole dataset was from Spectralis OCT (Heidelberg Engineering, German), and contains data with three different lesions: drusen, DME (diabetic macular edema), and CNV (choroidal neovascularization). The detailed description about this dataset could be found in [7]. To train the proposed Sparse-GAN and determine the threshold of anomaly score, we divide original training set into two parts: new training set with 50,140 normal images, validation set consists of 3000 disease images and 1000 normal images. The testing set is the same as the original dataset.

Evaluation Metrics

For a given test image , we use given in Eq. (4) to compute the anomaly score. Further, we use given in Eq. (5) for diagnosis. Based on the anomaly score, we mainly use AUC (Area under the ROC Curve) to evaluate our method. To compute accuracy (Acc), we need to determine the threshold of anomaly score on the validation set, which includes 75% disease images and 25% normal images. We adopt sensitivity (Sen) as the third evaluation metric. Finally, the threshold is then used for testing.

3.2 Training Details

The proposed Sparse-GAN is implemented in PyTorch with NVIDIA graphics processing units (GeForce TITAN V). The input image size is , while the batch size is 32. The optimizer is Adam and the learning rate is 0.001. Empirically, we let , and .

3.3 Quantitative Experimental Results

Method Val-set Test-set
Auto-Encoder 0.729 0.783 0.751 0.834
AnoGAN[13] 0.815 0.846 0.789 0.917
f-AnoGAN[14] 0.849 0.882 0.808 0.871
pix2pix [6] #1 0.805 0.861 0.818 0.879
pix2pix [6] #2 0.837 0.874 0.815 0.900
Sparse-GAN 0.885 0.925 0.841 0.951

#1, image level

#2, latent space

Table 1: Quantitative results for ablation studies and comparison with state-of-the-arts.

Ablation Study.

To justify the benefits of the anomaly score in latent space and the sparsity regulirization nets, we conduct the following ablation studies, we conduct some ablation studies: #1 denotes Image-to-Image GAN [6] predicting anomaly score in image-level, and #2 denotes Image-to-Image GAN [6] predicting anomaly score in latent feature.

By including loss based on Auto-Encoder, we improve the AUC result from 0.729 to 0.805 on the validation set. That is to say, adversarial learning is helpful. By transforming the reconstruction image into latent space, the result is improved from 0.805 to 0.837 on the validation set since the noise in images is harmful to diagnosis. Finally, by regularizing the latent features with our proposed Sparsity Regularization Net, the result is improved from 0.837 to 0.885, which means the sparsity regularization is effective. On the test set, the ablation studies validate the effectiveness of different modules too. Table 1 summarized the results.

Performance Comparison.

We further compare the proposed method with state-of-the-art networks, inlcuding Auto-Encoder, AnoGAN [13] and f-AnoGAN [14].

By comparing our adopted Image-to-Image GAN (i.e. 1) with primary AnoGAN [13], we improve the AUC result from 0.846 to 0.861 on the test set. That is to say, the end-to-end optimized generator is better than two stage trained generator. Compared with these methods, we get the highest AUC than others on both the validation set and test set. The accuracy of our method on the test set is comparable to supervised deep learning methods, and the sensitivity denotes missed diagnosis of our model is very low, which is more meaningful for clinicians. The results are also summarized in Table 1 .

3.4 Qualitative Analysis with Anomaly Activation Map

To further understand what the role of the lesion is for disease clinical diagnosis, some example images are shown in Fig 3. When Sparse-GAN classifies a given image as abnormal, AAM will be computed. In addition to the anomaly heatmap, we also show the output images and difference between the input image and output one. Since Sparse-GAN is only trained on the normal set, the model could not reconstruct abnormal patterns. Diff images show that noise in images is harmful to reconstruction. The heatmap can localize the lesion in general and this validates the effectiveness of our proposed AAM for anomaly detection framwork.

Figure 3: Anomaly heatmap on abnormal images. Diff images show that noise in images is harmful for reconstruction, and AAM images show the lesion play an important role for diagnosis in Sparse-GAN. (Best viewed with colors.)

4 Conclusion

In this work, we propose a novel Sparse-GAN for anomaly detection, which detects anomalies in latent space and the feature in latent space is constrained by a novel Sparsity Regularizer Net. The quantitative experimental results on a public dataset validate the feasibility of anomaly detection for OCT images and also validate the effectiveness of our method. Further, we also show the anomaly activation maps of the lesion to make our results more explainable.

5 Acknowledge

The project is partially supported by ShanghaiTech-Megavii Joint Lab, in part by the National Natural Science Foundation of China (NSFC) under Grants No. 61932020, and supported by the ShanghaiTech-UnitedImaging Joint Lab, Ningbo “2025 S&T Megaprojects” and Ningbo 3315 Innovation team grant. We also acknowledge the contribution of Weixin Luo and Wen Liu for their insightful comments with regard to the reconstruction-based anomaly detection method.


  1. thanks: {zhoukang, gaoshh},
  2. thanks: corresponding authors


  1. S. Akcay and A. Atapour-Abarghouei (2018) Ganomaly: semi-supervised anomaly detection via adversarial training. In Asian Conference on Computer Vision, pp. 622–637. Cited by: §2.
  2. S. Apostolopoulos and S. De Zanet (2017) Pathological oct retinal layer segmentation using branch residual u-shape networks. In MICCAI, pp. 294–301. Cited by: §1.
  3. I. Goodfellow and J. Pouget-Abadie (2014) Generative adversarial nets. In Advances in neural information processing systems, pp. 2672–2680. Cited by: §1.
  4. Z. Gu and J. Cheng (2019) CE-net: context encoder network for 2d medical image segmentation. IEEE transactions on medical imaging. Cited by: §1.
  5. D. Huang and E. A. Swanson (1991) Optical coherence tomography. Science 254 (5035), pp. 1178–1181. Cited by: §1.
  6. P. Isola and J. Zhu (2017) Image-to-image translation with conditional adversarial networks. In CVPR, pp. 1125–1134. Cited by: §1, §2.1, §2, §3.3.1, Table 1.
  7. D. S. Kermany and M. Goldbaum (2018) Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell 172 (5), pp. 1122–1131. Cited by: §3.1.1.
  8. A. Krizhevsky and I. Sutskever (2012) Imagenet classification with deep convolutional neural networks. In NeurIPS, pp. 1097–1105. Cited by: §1.
  9. C. S. Lee, A. J. Tyring, N. P. Deruyter, Y. Wu, A. Rokem and A. Y. Lee (2017) Deep-learning based, automated segmentation of macular edema in optical coherence tomography. Biomedical optics express 8 (7), pp. 3440–3448. Cited by: §1.
  10. D. Lian and L. Hu (2018) Multiview multitask gaze estimation with deep convolutional neural networks. IEEE transactions on neural networks and learning systems. Cited by: §1.
  11. W. Luo and W. Liu (2019) Video anomaly detection with sparse coding inspired deep neural networks. IEEE Transactions on Pattern Analysis and Machine Intelligence. Cited by: §2.3.
  12. W. Luo (2017) A revisit of sparse coding based anomaly detection in stacked rnn framework. In ICCV, pp. 341–349. Cited by: §2.3.
  13. T. Schlegl and P. Seeböck (2017) Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. In IPMI, pp. 146–157. Cited by: §1, §3.3.2, §3.3.2, Table 1.
  14. T. Schlegl and P. Seeböck (2019) F-anogan: fast unsupervised anomaly detection with generative adversarial networks. Medical Image Analysis. Cited by: §1, §3.3.2, Table 1.
  15. P. Seeböck and S. M. Waldstein (2018) Unsupervised identification of disease marker candidates in retinal oct imaging data. IEEE TMI. Cited by: §1.
  16. D. Sidibe and S. Sankar (2017) An anomaly detection approach for the identification of dme patients using spectral domain optical coherence tomography images. Computer methods and programs in biomedicine 139, pp. 109–117. Cited by: §1.
  17. Y. Zhao and Y. Zheng (2018) Uniqueness-driven saliency analysis for automated lesion detection with applications to retinal diseases. In MICCAI, pp. 109–118. Cited by: §1.
  18. B. Zhou and A. Khosla (2016) Learning deep features for discriminative localization. In CVPR, pp. 2921–2929. Cited by: §2.4.
  19. J. T. Zhou and K. Di (2018) SC2Net: sparse lstms for sparse coding. In AAAI, Cited by: §2.3.
  20. K. Zhou and Z. Gu (2018) Multi-cell multi-task convolutional neural networks for diabetic retinopathy grading. In 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pp. 2724–2727. Cited by: §1.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description