Versatile Auxiliary Classifier with Generative Adversarial Network (VAC+GAN), Multi Class Scenarios

Versatile Auxiliary Classifier with Generative Adversarial Network (VAC+GAN), Multi Class Scenarios

Training Conditional Generators
Shabab Bazrafkan S. Bazrafkan Cognitive, Connected & Computational Imaging Research
National University of Ireland Galway
Tel.: +353-83-466-7835
22email: s.bazrafkan1@nuigalway.ieP. Corcoran Cognitive, Connected & Computational Imaging Research
National University of Ireland Galway
   Peter Corcoran S. Bazrafkan Cognitive, Connected & Computational Imaging Research
National University of Ireland Galway
Tel.: +353-83-466-7835
22email: s.bazrafkan1@nuigalway.ieP. Corcoran Cognitive, Connected & Computational Imaging Research
National University of Ireland Galway
Received: date / Accepted: date
Abstract

Conditional generators learn the data distribution for each class in a multi-class scenario and generate samples for a specific class given the right input from the latent space. In this work, a method known as “Versatile Auxiliary Classifier with Generative Adversarial Network”for multi-class scenarios is presented. In this technique, the Generative Adversarial Networks (GAN)’s generator is turned into a conditional generator by placing a multi-class classifier in parallel with the discriminator network and backpropagate the classification error through the generator. This technique is versatile enough to be applied to any GAN implementation. The results on two databases and comparisons with other method are provided as well.

Keywords:
Conditional deep generators Generative Adversarial Networks Machine learning

1 Introduction

With emerge of affordable parallel processing hardware, it became almost impossible to find any aspect of Artificial Intelligence (AI) that Deep Learning (DL) has not been applied to CEmag1 (). DL provides superior outcomes on classification and regression problems compared to classical machine learning methods. The impact of DL is not limited to such problems, but also generative models are taking advantage of these techniques in learning data distribution for big data scenarios where classical methods fail to provide a solution. Generative Adversarial Networks (GAN) GAN () utilise Deep Neural Network capabilities and are able to estimate the data distribution for large size problems. These models comprise two networks, a generator, and a discriminator. The generator makes random samples from a latent space, and the discriminator determines whether the sample is adversarial, made by the generator, or is genuine image coming from the dataset. GANs are successful implementations of deep generative models, and there are multiple variations such as WGAN WGAN (), EBGAN EBGAN (), BEGAN BEGAN (), ACGAN ACGAN (), and DCGAN DCGAN (), which have evolved from the original GAN by altering the loss function and/or the network architecture. Variational Autoencoders (VAE) VAE () are the other successful implementation of deep generative models. In these models the bottleneck of a conventional autoencoder is considered as the latent space of the generator, i.e., the samples are fed to an autoencoder, and besides the conventional autoencoder’s loss function, the KullbackLeibler (KL) divergence between the distribution of the data at the bottleneck is minimized compared to a Gaussian distribution. In practice, this is achieved by adding the KL divergence term to the means square error of the autoencoder network. The biggest downside to VAE models is their blurry outputs due to the mean square error loss VAE2 (). PixelRNN and PixelCNN RNN () are other famous implementations of the deep neural generative models. PixelRNN is made of 2-dimensional LSTM units, and in PixelCNN, a Deep Convolutional Neural Network is utilized to estimate the distribution of the data.
Training conditional generators are one of the most appealing applications of GAN. Conditional GAN (CGAN) CGAN () and Auxiliary Classifier GAN (ACGAN) ACGAN () are among the most utilized schemes for this purpose. Wherein the CGAN approach uses the auxiliary class information alongside with partitioning the latent space and ACGAN improves the CGAN idea by introducing a classification loss which back-propagates through the discriminator and generator network. The CGAN method is versatile enough to apply to every variation of GAN. But ACGAN is restricted to a specific loss function which decreases its adaptivity to other GAN varieties.
In VACGAN, the ACGAN technique is extended to be applicable to any GAN implementation for binary problems (2 class scenarios). The technique is known as Versatile Auxiliary Classifier with Generative Adversarial Network (VAC+GAN) and is implemented by placing a classifier in parallel with the discriminator and back-propagate the classification error through the generator alongside the GAN’s loss.
This work expand the original VAC+GAN VACGAN idea to multi-class scenarios. In this approach, the classifier is trained independently from the discriminator which gives the opportunity of applying it to any variation of GAN. The main contribution of VAC+GAN is its versatility, and proofs are provided to show the applicability of the method regardless of the GAN structure or loss functions.
In the next section the VAC+GAN for multi-class scenarios is explained. And in the third section the implementations of the ACGAN and VAC+GAN is presented alongside with the comparisons with other methods. The discussions and future works are given in the last section.

2 Versatile Auxiliary Classifier + Generative Adversarial Network (VAC+GAN)

The concept proposed in this research is to place a classifier network in parallel with the Discriminator. The classifier accepts the samples from the generator, and the classification error is back-propagated through the classifier and the generator. The model structure is shown in figure 1.

Figure 1: The presented model for training conditional deep generators.

In this section it is shown that by placing a classifier at the output of the generator and minimizing the categorical cross-entropy as the classifiers loss, the Jensen-Shannon Divergence between all the classes is increased. The terms used in the mathematical proofs are as follows:

  1. is the number of the classes.

  2. The latent space is partitioned in to subsets. This means that are disjoint and their union is equal to the -space.

  3. is the classifier function.

  4. is the binary cross-entropy loss function.

  5. is the categorical cross-entropy loss function.

Proposition 1.

In the multiple classes case, the classifier has outputs, where is the number of the classes. In this approach, each output of the classifier corresponds to one class. For a fixed Generator and Discriminator, the optimal output for class (’th output) is:

(1)
Proof.

Considering just one of the outputs of the classifier, the categorical cross-entropy can be reduced to binary cross-entropy given by

(2)

which is equal to

(3)

By considering we have

(4)

The function gets its maximum at for any , concluding the proof. ∎

Theorem 2.1.

The maximum value for is and is achieved if and only if .

Proof.

The categorical cross-entropy is given by

(5)

From equation 1 we have

(6)

Where is the Kullback-Leibler divergence, which is always positive or equal to zero.
Now consider . From 6 we have

(7)

concluding the proof. ∎

Theorem 2.2.

Minimizing increases the
Jensen-Shannon Divergence between

Proof.

From equation 6 we have

(8)

Which can be rewritten as

(9)

Which is equal to

(10)

This equation can be rewritten as

(11)

wherein the is the Shannon entropy of the distribution .
The Jensen Shannon divergence between distributions , is defined as

(12)

From equations 11 and 12 we have

(13)

Minimizing is increasing the JSD term, concluding the proof. ∎

In this section it has been shown that by placing a classifier at the output of the generator and back-propagate the classification error throughout the generator one can increase the dis-similarity between the classes for generator and therefore train a deep generator that can produce class specified samples. In the next section the proposed idea is implemented for multi-class cases and also compared with state of the art methods.

3 Experimental Results

In this section, two main experiments are explained to show the effectiveness of VAC+GAN. The first one is on MNIST database and visual comparisons with CGAN, CDCGAN and ACGAN is presented. The second experiment is on CFAR10 dataset and the classification error is compared against ACGAN method. All the networks are trained in Lasagne LASAGNE () on top of Theano THEANO () library in Python, unless stated otherwise.

3.1 Mnist

In this experiment, the performance of the proposed method is investigated on MNIST database. MNIST (”Modified National Institute of Standards and Technology”) is known as the ”hello world” dataset of computer vision. It is a historically significant image classification benchmark introduced in 1999, and there has been a considerable amount of research published on MNIST image classification. MNIST contains 60,000 training images and 10,000 test images, both drawn from the same distribution. It consists of pixel images of handwritten digits. Each image is assigned a single truth label digit from .
The proposed method has been applied to the DCGAN scheme. The Generator, Discriminator and the Classifier used in this experiment are given in tables 1, 2 and 3 respectively.

Layer Type kernel Activation
Input Input
Hidden 1 Dense ReLU
BatchNorm 1
Hidden 2 Dense ReLU
BathNorm 2
Hidden 3 Deconv (64ch) ReLU
BathNorm 3
Output Deconv (1ch) Sigmoid
Table 1: the generator structure for the MNIST+DCGAN experiment. All deconvolution layers are using (2,2) padding with stride (2,2).
Layer Type kernel Activation
Input Input
Hidden 1 Conv (64 ch) LeakyR(0.2)
BatchNorm 1
Hidden 2 Conv (128 ch) LeakyR(0.2)
BathNorm 2
Hidden 3 Dense 1024 LeakyR(0.2)
Output Dense 1 Sigmoid
Table 2: the discriminator structure for the MNIST+DCGAN experiment. All convolution layers are using (2,2) padding with stride (2,2).
Layer Type Kernel Activation
Input Input
Hidden 1 Conv (16 ch) ReLU
Pool 1 Max pooling
Hidden 2 Conv (8 ch) ReLU
Pool 2 Max pooling
Hidden 3 Dense 1024 ReLU
Output Dense 10 Softmax
Table 3: the classifier structure for the MNIST+DCGAN experiment.

And the loss function for the proposed method (VAC+GAN) is given by:

(14)

where, , and are the generator and discriminator losses respectively, is the generator function, is the binary cross-entropy loss for discriminator and is the categorical cross-entropy loss for the classifier. In this experiment, and are equal to 0.2 and 0.8 respectively.
The optimizer used for training the generator and discriminator is ADAM with learning rate, and equal to 0.0002, 0.5 and 0.999 respectively. And the classifier is optimized using nestrov momentum gradient descent with learning rate and momentum equal to 0.01 and 0.9 respectively. The results of the conditional generators trained using Conditional GAN (CGAN)111https://github.com/znxlwm/tensorflow-MNIST-cGAN-cDCGAN, Conditional DCGAN (CDCGAN)222https://github.com/znxlwm/tensorflow-MNIST-cGAN-cDCGAN, ACGAN333https://github.com/buriburisuri/ac-gan, and proposed method (VAC+GAN) on MNIST dataset are shown in figures 2, 3, 4444https://github.com/buriburisuri/ac-gan/blob/master/png/sample.png,and 5 respectively.

Figure 2: Samples drawn from conditional generator trained using CGAN scheme on MNIST dataset. each row corresponds to one class.
Figure 3: Samples drawn from conditional generator trained using CDCGAN scheme on MNIST dataset. each row corresponds to one class.
Figure 4: Samples drawn from conditional generator trained using ACGAN scheme on MNIST dataset. each row corresponds to one class.
Figure 5: Samples drawn from conditional generator trained using proposed scheme (VAC+GAN) on MNIST dataset. each row corresponds to one class.

As it is shown in these figures the presented method gives superior results compare to CGAN and CDCGAN while using the exact same structure of generator as in CDCGAN. The results are comparable with ACGAN and the difference here is that this method is more versatile and can be applied to any GAN model regardless of model architecture and loss function.

3.2 Cfar10

The CFAR10 database CFAR10 () consists of 60000 images in 10 classes wherein 50000 of these images are for training and 10000 for testing purposes. The next experiment is comparing ACGAN555https://github.com/King-Of-Knights/Keras-ACGAN-CIFAR10 to VAC+GAN method on generating images and also the classification accuracy of these methods are compared. Networks utilized in this experiment are shown in tables 4,5 and 6 correspond to generator, discriminator666https://github.com/King-Of-Knights/Keras-ACGAN-CIFAR10/blob/master/cifar10.py and classifier respectively. The same generator and discriminator architectures have been used in both implementations to obtain fair comparisons.

Layer Type kernel Activation
Input Input
Hidden 1 Dense ReLU
Reshape Reshape 384ch
Hidden 2 DeConv (192 ch) ReLU
BathNorm 2
Hidden 3 DeConv (96 ch) ReLU
BathNorm 3
Output DeConv (3 ch) tanh
Table 4: the generator structure for the CFAR10 experiment. All deconvolution layers are using ’SAME’ padding with stride (2,2).
Layer Type kernel Activation
Input Input
Gaussian Noise
Hidden 1 Conv 16ch LeakyR(0.2)
DropOut 1 DropOut
Hidden 2 Conv 32ch LeakyR(0.2)
BathNorm 1
DropOut 2 DropOut
Hidden 3 Conv 64ch LeakyR(0.2)
BathNorm 2
DropOut 3 DropOut
Hidden 4 Conv 128ch LeakyR(0.2)
BathNorm 3
DropOut 4 DropOut
Hidden 5 Conv 256ch LeakyR(0.2)
BathNorm 4
DropOut 5 DropOut
Hidden 6 Conv 512ch LeakyR(0.2)
BathNorm 5
DropOut 6 DropOut
MBDisc MBDISC ()
Output Dense 1 sigmoid
Table 5: the discriminator structure for the CFAR10 experiment. All deconvolution layers are using ’SAME’ padding with kernel size , stands for stride size and MBDisc is Mini Batch Discrimination layer explained in MBDISC ().
Layer Type kernel Activation
Input Input
Hidden 1 Conv (128ch) ReLU
BatchNorm 1
MaxPool 1 MaxPool (2,2)
Hidden 2 Conv (256ch) ReLU
BatchNorm 2
MaxPool 2 MaxPool (2,2)
Hidden 3 Conv (512ch) ReLU
BatchNorm 3
MaxPool 3 MaxPool (2,2)
Hidden 4 Dense 512 ReLU
Output Dense 10 softmax
Table 6: the classifier structure for the CFAR10 experiment.

The loss function used to train the VAC+GAN is given by

(15)

where, , and are the generator and discriminator losses respectively, is the generator function, is the binary cross-entropy loss for discriminator and is the categorical cross-entropy loss for the classifier. In this experiment, and are equal to 0.5 and 0.5 respectively.
The optimizer used for training the generator and discriminator is ADAM with learning rate, and equal to 0.0002, 0.5 and 0.999 respectively. And the classifier is optimized using nestrov momentum gradient descent with learning rate and momentum equal to 0.01 and 0.9 respectively. The results for ACGAN and proposed method are shown in figures 6777https://github.com/King-Of-Knights/Keras-ACGAN-CIFAR10/blob/master/plot_epoch_220_generated.png and 7 respectively.

Figure 6: Generated samples using ACGAN.
Figure 7: Generated samples using VAC+GAN.

The CFAR10 database is an extremely unconstrained and there are just 10000 samples in each class. Therefore the output of both implementations are vague and in order to compare these methods the classification errors are compared. The confusion matrix for ACGAN and VAC+GAN are shown in figures 8888https://github.com/King-Of-Knights/Keras-ACGAN-CIFAR10/blob/master/Confusion_Matrix.png and 9 respectively.

Figure 8: Confusion matrix for ACGAN method on CFAR10.
Figure 9: Confusion matrix for VAC+GAN method on CFAR10.

Confusion matrices show the better classification performed by the VAC+GAN compared to the ACGAN. Classification accuracies for ACGAN and VAC+GAN on CFAR10 are and respectively after 200 epochs. The proposed method gives higher accuracy. The main advantage of the proposed method is the versatility in choosing the proper classifier network while in the ACGAN method the classification task is restrained to discriminator because the discriminator is performing as classifier as well. The VAC+GAN method is versatile in choosing the GAN scheme as well. It can be applied to any GAN implementation just by placing a classifier in parallel with discriminator.

4 Discussion and Conclusion

In this work, a new approach introduced to train conditional deep generators. It also has been proven that VAC+GAN is applicable to any GAN framework regardless of the model structure and/or loss function (see Sec 2) for multi class problems. The idea is to place a classifier in parallel to the discriminator network and back-propagate the classification loss through the generator network in the training stage.
It has also been shown that the presented framework increases the Jensen Shannon Divergence (JSD) between classes generated by the deep generator. i.e., the generator can produce more distinct samples for different classes which is desirable.
The results has been compared to the implementation of CGAN, CDCGAN and ACGAN on MNIST dataset and also the comparisons are given on CFAR10 dataset with respect to ACGAN method. The ACGAN gives comparable results, but the main advantage of the proposed method is its versatility in choosing the GAN scheme and also the classifier architecture.
The future work includes applying the method to datasets with larger number of classes and also extend the implementation for bigger size images. The other idea is to apply this method to regression problems

Acknowledgements.
This research is funded under the SFI Strategic Partnership Program by Science Foundation Ireland (SFI) and FotoNation Ltd. Project ID: 13/SPP/I2868 on Next Generation Imaging for Smartphone and Embedded Platforms.

References

  • (1) Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks. In: International Conference on Machine Learning, pp. 214–223 (2017)
  • (2) Bergstra, J., Bastien, F., Breuleux, O., Lamblin, P., Pascanu, R., Delalleau, O., Desjardins, G., Warde-Farley, D., Goodfellow, I., Bergeron, A., et al.: Theano: Deep learning on gpus with python. In: NIPS 2011, BigLearning Workshop, Granada, Spain, vol. 3. Citeseer (2011)
  • (3) Berthelot, D., Schumm, T., Metz, L.: Began: Boundary equilibrium generative adversarial networks. arXiv preprint arXiv:1703.10717 (2017)
  • (4) Dieleman, S., Schlüter, J., Raffel, C., Olson, E., Sønderby, S.K., Nouri, D., Maturana, D., Thoma, M., Battenberg, E., Kelly, J., Fauw, J.D., Heilman, M., de Almeida, D.M., McFee, B., Weideman, H., Takács, G., de Rivaz, P., Crall, J., Sanders, G., Rasul, K., Liu, C., French, G., Degrave, J.: Lasagne: First release. (2015). DOI 10.5281/zenodo.27878. URL http://dx.doi.org/10.5281/zenodo.27878
  • (5) Frans, K.: Variational autoencoders explained (2016). URL http://kvfrans.com/variational-autoencoders-explained/
  • (6) Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Advances in neural information processing systems, pp. 2672–2680 (2014)
  • (7) Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013)
  • (8) Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images (2009)
  • (9) Lemley, J., Bazrafkan, S., Corcoran, P.: Deep learning for consumer devices and services: Pushing the limits for machine learning, artificial intelligence, and computer vision. IEEE Consumer Electronics Magazine 6(2), 48–56 (2017)
  • (10) Mirza, M., Osindero, S.: Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014)
  • (11) Odena, A., Olah, C., Shlens, J.: Conditional image synthesis with auxiliary classifier gans. arXiv preprint arXiv:1610.09585 (2016)
  • (12) Oord, A.v.d., Kalchbrenner, N., Kavukcuoglu, K.: Pixel recurrent neural networks. arXiv preprint arXiv:1601.06759 (2016)
  • (13) Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434 (2015)
  • (14) Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training gans. In: Advances in Neural Information Processing Systems, pp. 2234–2242 (2016)
  • (15) Zhao, J., Mathieu, M., LeCun, Y.: Energy-based generative adversarial network. arXiv preprint arXiv:1609.03126 (2016)
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
204764
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description