Semi-supervised Anomaly Detection Using GANs for Visual Inspection in Noisy Training Data

# Semi-supervised Anomaly Detection Using GANs for Visual Inspection in Noisy Training Data

Masanari Kimura
Ridge-i Inc.
mkimura@ridge-i.com
Takashi Yanagihara
Ridge-i Inc.
tyanagihara@ridge-i.com
###### Abstract

The detection and the quantification of anomalies in image data are critical tasks in industrial scenes such as detecting micro scratches on product. In recent years, due to the difficulty of defining anomalies and the limit of correcting their labels, research on unsupervised anomaly detection using generative models has attracted attention. Generally, in those studies, only normal images are used for training to model the distribution of normal images. The model measures the anomalies in the target images by reproducing the most similar images and scoring image patches indicating their fit to the learned distribution. This approach is based on a strong presumption; the trained model should not be able to generate abnormal images. However, in reality, the model can generate abnormal images mainly due to noisy normal data which include small abnormal pixels, and such noise severely affects the accuracy of the model. Therefore, we propose a novel semi-supervised method to distort the distribution of the model with existing abnormal images. The proposed method detects pixel-level micro anomalies with a high accuracy from high resolution images which are actually used in an industrial scene. In this paper, we share experimental results on open datasets, due to the confidentiality of the data.

## 1 Introduction

The detection and the quantification of anomalies in image data are critical tasks in many industries such as detecting micro scratches on product surfaces, or finding out diseases from medical images. There are many studies dealing with such tasks [11, 6, 1]. These studies are addressing task-specific problems in detecting anomalies from images.

For such tasks, applying supervised learning method is fairly hard in general due to the difficulty of defining anomalies and collecting enough number of abnormal data. In recent years, research on unsupervised anomaly detection using generative models has attracted attention.

One of the most successful cases regarding generative model research is Generative Adversarial Networks (GANs) [3]. A GAN mimics the given target distribution by simultaneously training typically two networks, a generator and a discriminator . The produces the model distribution and distinguishes the model distribution from the target. This learning framework has been successful in various application fields such as image generation [14], semantic segmentation [10], image translation [23][20], and super resolution [7], among others. In this research, we apply GANs to anomaly detection.

There are several studies on anomaly detection using GANs [17, 22, 19, 15]. In those studies, only normal images are used to train GAN to model the distribution of the normal images. After the training is converged and a target image is queried, generates the most similar image to the target. When anomalies are included in the target image, there should be some distances between the target and the generated images, since the model only knows the distribution of the normal images. Whether the image is classified as normal or not is decided based on the threshold for the distance. There is one common strong assumption among these approaches ; the model trained with normal images only should generate normal images. In other words, should not be able to generate abnormal images.

However, in reality, the model can also generate abnormal images mainly due to a small number of abnormal pixels included in some normal images that are used for training. Trained with such noisy data, the generator recognizes them as part of the normal features. In real-world data, immaculate normal data are quite rare and it is virtually impossible to completely remove a few pixels of abnormal or distorted features included in normal images. (See Figure 1)

Therefore, we propose a semi-supervised learning method effectively utilizing given abnormal images to resolve the issue. Our main contributions are the following:

• We solve the issues in the failure cases of the earlier studies on anomaly detection using GANs.

• We propose a semi-supervised method for anomaly detection using GANs. Our method achieves accurate anomaly detection by utilizing both normal and abnormal images.

• Our method successfully detects pixel-level micro anomalies in high resolution images from an actual industrial scene.

## 2 Related Works

In this section, we outline two studies referred in our research: Generative Adversarial Networks (GANs) and Anomaly Detection using GANs.

In recent years, GANs [3] have achieved a great success in image generation tasks. A GAN consists of two networks, a generator and a discriminator . A generator learns the distribution by mapping noise , which is sampled from the uniform distribution, to the image space . A discriminator learns to distinguish between generated images and genuine images. The discriminator and the generator are simultaneously optimized through the two-player minimax game as follows:

 minGmaxDV(D,G)=Ex∼pd(x)[logD(x)]+Ex∼pz(z)[log(1−D(G(z))] (1)

Here, is the distribution of real data and is the distribution of noise . As adversarial training continues, the generator becomes able to generate samples that look similar to the real images, and the discriminator becomes able to identify whether an image is genuine or generated.

Despite of its successful performance in many fields, GANs’ instability during the training has always been a critical issue, particularly with complicated images as in a photo-realistic high resolution case. To solve such problems and improve the learning stability of GANs, many studies have been released  [12, 2, 4, 16, 5] and Karras et al. proposed a method called progressive growing which gradually increases the resolution from low-resolution images throughout the learning phase [5]. This method successfully generates much clearer high-resolution images compared to the existing methods can. We applied the progressive growing framework to our research so that we can accurately detect small anomalies in high-resolution images.

### 2.2 Anomaly Detection using GANs

Recently GANs have been used in anomaly detection research and AnoGAN [17] brought a great progress to the field with a simple algorithm that only normal images are used for training a generator to model the distribution of normal images. With the trained , if a given new query image is from the normal data distribution, noise must exist in the latent space where becomes identical to . However if is abnormal, will not exist even though tries to generate images most similar to . The algorithm is heavily based on this hypothesis.

To find the noise , AnoGAN uses two loss functions : residual loss and discrimination loss.

Residual Loss The residual loss measures the visual distance between the input image and the generated image .

 LR(z)=∑|x−G(z)| (2)

If perfectly learned the distribution of the normal data, should work as follows:

• Input image is normal:

• Input image is abnormal:

From the above, we can formulate visual differences.

Discrimination Loss based on feature matching In addition to , the discrimination loss is based on feature matching which uses an intermediate feature representation of the discriminator by

 LD(z)=∑|f(x)−f(G(z))| (3)

where the output of the discriminator’s intermediate layer is used to extract the features of the input image and the generated image. For to learn the mapping to the latent space, overall loss is defined as the weighted sum of both components:

 LAno(z)=(1−λ)⋅LR(z)+λ⋅LD(z) (4)

With this differentiable loss function, that makes the image generated by ) most similar to the input image can be searched using back propagation. The trained parameters of the generator and the discriminator are kept fixed during the search. This is the fundamental mechanism of using GANs for anomaly detection.

## 3 Proposed Method

Using GANs for anomaly detection is based on a strong assumption that trained with normal images cannot generate abnormal images; should generate normal images only. However, in reality, there are circumstances in which generates abnormal images due to the following factors:

• The normal images used for training actually included some anomalies, and the generator learns it as normal features. Fig1 shows the noisy samples included in the MNIST handwritten digits dataset.

• The generator does not have enough representation power or enough training data to learn perfect mapping from to the image space.

These are the natural behavior of the GAN architecture and we focus on dealing with unavoidable anomalies occurring in the dataset. In real wild data, immaculate normal samples very rare and it is practically unfeasible to completely remove a few pixels of abnormal or distorted features found from the normal dataset.

To solve the problem, we propose a method of helping the generator learn distributions similar to the distribution of the normal data with abnormal images that already exist.

The proposed method reconstructs the learning framework of GANs. The objective function of GANs can be transformed from Equation 1 to the following equation:

 V(D,G)=∫xpd(x)logD(x)+pg(x)log(1−D(x))dx (5)

Here, let , , ,

 h(y) = alogy+blog(1−y) (6) ddyh(y) = ay+by−1 (7) = a(y−1)+b(y)y(y−1) (8) = y(a+b)−ay(y−1) (9)

Since when in Equation 9, the optimum discriminator is derived as below:

 D∗(x)=pd(x)pd(x)+pg(x) (10)

Now, we consider Jensen-Shannon divergence (JSD), which is defined as follows:

 JSD(pd||pg) = 12(KL(pd||pA)+KL(pg||pA)) (11) KL(pd||pg) = ∫xpd(x)logpd(x)pg(x)dx (12) pA = pd+pg2 (13)

By combining Equation 5, 10, 11 and 12, we obtain the optimum object function below:

 V(D∗,G)=2JSD(pd||pg)−2log2 (14)

From Equations 14, we can assume that the generator aims at minimizing the JSD between the real distribution and the generator distribution.

In contrast to the typical objective loss function – Equation 1 which only takes normal images into consideration, we define an additional loss function to consider abnormal images as well. Our proposed method treats abnormal images as another type of generated images and adds penalty loss with penalty weight . This can be regarded as distorting the data distribution . The objective loss function is defined as :

where

 lAdv(D,G) = V(D,G) (16) lAn(D) = Ex∼pan(x)[log(1−D(x)]. (17)

Here, is the distribution of abnormal images and is a parameter of (0, 1]. The parameter controls the percentage of abnormal images generated. A smaller excludes abnormal images from the training data, but at the same time it might even penalize normal features included in abnormal data. Combining Equation 14, 15 and our proposed definition, we obtain the objective function of the generator as :

 V′(D∗,G) = 2JSD(pd||pN)−2log2 (18) pN = γpg+(1−γ)pan. (19)

Therefore, we can assume that the objective function of is to minimize the JSD between the real image distribution and the mixed distribution of generated images and abnormal images. Since the JSD becomes the minimum when , the optimum for can be derived as follows:

 p∗N = pd (20) γp∗g+(1−γ)pan = pd (21) γp∗g = pd−(1−γ)pan (22) p∗g = 1γpd−(1−γ)γpan (23)

where and . From the equations above, we may consider that the proposed method distorts the distribution of real images to remove abnormal images and to make the ideal distribution of normal images. Figure 3 represents the change in the distribution caused by the proposed method.

During the inference, our method uses the following function to search the noise that makes an image most similar to the input.

 L′(z) = γLAno(z)+(1−γ)L′An(z) (24) L′An(z) = ∑|1−D(G(z))| (25)

The function updates the noise based on the value of . Algorithm 1 shows the inference algorithm.

After finding the noise , we can use to classify images in to normal and abnormal. Furthermore, we can identify the abnormal pixels in the image by calculating the difference between the generated image and the input.

## 4 Experiments

Our method successfully detects pixel-level micro anomalies in high resolution images from the real industrial data with a high accuracy. However, we conducted an experiment with open dataset as we cannot disclose the images due to the confidentiality of the data. We use , and as each parameter of the proposed method.

### 4.1 Datasets

We use the following two datasets for experiments. Table1 shows the list of datasets.
MNIST: This dataset includes handwritten digits from to . This dataset has a training set of examples and a test set of examples. The images included in this dataset are unified to the size of pixels. We regard one class out of ten classes as a normal data and we consider images of other classes to be abnormal. We select a class as a normal class and allocate of it to the training set. In addition to that, from all the classes other than the normal class is added to the training data so that the training data can have both normal and abnormal class. After training, we test with the data that was not used for learning and contains both of the classes . This experiment is repeated for all of the classes.
Caltech-256: This dataset222$http://www.vision.caltech.edu/Image_{D}atasets/Caltech256/$ includes images of object categories. Each category has at least images. We follow the experimental setup of Sabokrou et al[15]. In addition, we sample images from all outlying category images, add them to the training data and train the model with the proposed method.

### 4.2 Network Architecture

We use the progressive growing  [5] as a learning framework and further details about the network architecture and the hyper parameters can be found in Table 2 and Appendix A in the corresponding paper.

### 4.3 Results

We introduce some experimental results on the benchmark dataset.
MNIST Results: Figure4 shows the experimental results for the MNIST dataset. We regarded 0 as a normal class, and 1 as the abnormal class. We could almost perfectly generate images belonging to the normal class at the bottom, but images of abnormal class and normal noisy class was not properly generated. Figure5 shows the distribution of the pixel difference between the input image and the generated image for label 0 and label 6 using the same model. We show the results of sampling and inferring 100 images for each class and averaging them.
Caltech-256 Results: Table2 shows the experimental results with the Caltech-256 dataset. We used the result of Sabokrou[15] for the comparison method other than AnoGAN[17] and the proposed method. In this experiment, we evaluate all the methods based on the -score metrics with different numbers of normal-classified categories . In all three cases, our proposed method outperformed all other methods and maintained its high performance even when the number of normal-classified categories increased, whereas AnoGAN’s score dropped under such a circumstance.

## 5 Conclusion

In this paper, we proposed a method of detecting anomalies with GANs using both normal images and given abnormal images. By distorting the data distribution and excluding the distribution of the abnormal images, the network can learn more about ideal normal data distributions. This method allows us to make more robust and accurate models to detect anomalies; our model detected smaller than 1% of abnormal pixels in high resolution images. Due to the confidential nature of the data, we share only the results for open datasets, and further validation on various datasets is desirable.

## References

• [1] J. An and S. Cho. Variational autoencoder based anomaly detection using reconstruction probability. SNU Data Mining Center, Tech. Rep., 2015.
• [2] M. Arjovsky, S. Chintala, and L. Bottou. Wasserstein gan. arXiv preprint arXiv:1701.07875, 2017.
• [3] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In Advances in neural information processing systems, pages 2672–2680, 2014.
• [4] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville. Improved training of wasserstein gans. In Advances in Neural Information Processing Systems, pages 5769–5779, 2017.
• [5] T. Karras, T. Aila, S. Laine, and J. Lehtinen. Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196, 2017.
• [6] S. S. Kim and A. N. Reddy. Image-based anomaly detection technique: algorithm, implementation and effectiveness. IEEE Journal on Selected Areas in Communications, 24(10):1942–1954, 2006.
• [7] C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, et al. Photo-realistic single image super-resolution using a generative adversarial network. arXiv preprint, 2016.
• [8] G. Lerman, M. B. McCoy, J. A. Tropp, and T. Zhang. Robust computation of linear models by convex relaxation. Foundations of Computational Mathematics, 15(2):363–410, 2015.
• [9] G. Liu, Z. Lin, and Y. Yu. Robust subspace segmentation by low-rank representation. In Proceedings of the 27th international conference on machine learning (ICML-10), pages 663–670, 2010.
• [10] P. Luc, C. Couprie, S. Chintala, and J. Verbeek. Semantic segmentation using adversarial networks. arXiv preprint arXiv:1611.08408, 2016.
• [11] L. H. Quam. Road tracking and anomaly detection in aerial imagery. Technical report, SRI INTERNATIONAL MENLO PARK CA ARTIFICIAL INTELLIGENCE CENTER, 1978.
• [12] A. Radford, L. Metz, and S. Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434, 2015.
• [13] M. Rahmani and G. K. Atia. Coherence pursuit: Fast, simple, and robust principal component analysis. IEEE Transactions on Signal Processing, 65(23):6260–6275, 2017.
• [14] S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, and H. Lee. Generative adversarial text to image synthesis. arXiv preprint arXiv:1605.05396, 2016.
• [15] M. Sabokrou, M. Khalooei, M. Fathy, and E. Adeli. Adversarially learned one-class classifier for novelty detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3379–3388, 2018.
• [16] T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and X. Chen. Improved techniques for training gans. In Advances in Neural Information Processing Systems, pages 2234–2242, 2016.
• [17] T. Schlegl, P. Seeböck, S. M. Waldstein, U. Schmidt-Erfurth, and G. Langs. Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. In International Conference on Information Processing in Medical Imaging, pages 146–157. Springer, 2017.
• [18] M. C. Tsakiris and R. Vidal. Dual principal component pursuit. In Proceedings of the IEEE International Conference on Computer Vision Workshops, pages 10–18, 2015.
• [19] H.-g. Wang, X. Li, and T. Zhang. Generative adversarial network based novelty detection usingminimized reconstruction error. Frontiers of Information Technology & Electronic Engineering, 19(1):116–125, 2018.
• [20] Z. Yi, H. Zhang, P. Tan, and M. Gong. Dualgan: Unsupervised dual learning for image-to-image translation. arXiv preprint, 2017.
• [21] C. You, D. P. Robinson, and R. Vidal. Provable selfrepresentation based outlier detection in a union of subspaces. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, pages 1–10, 2017.
• [22] H. Zenati, C. S. Foo, B. Lecouat, G. Manek, and V. R. Chandrasekhar. Efficient gan-based anomaly detection. arXiv preprint arXiv:1802.06222, 2018.
• [23] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros. Unpaired image-to-image translation using cycle-consistent adversarial networks. arXiv preprint arXiv:1703.10593, 2017.
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters