Versatile Auxiliary Classifier with Generative Adversarial Network (VAC+GAN), Multi Class Scenarios
Conditional generators learn the data distribution for each class in a multi-class scenario and generate samples for a specific class given the right input from the latent space. In this work, a method known as “Versatile Auxiliary Classifier with Generative Adversarial Network”for multi-class scenarios is presented. In this technique, the Generative Adversarial Networks (GAN)’s generator is turned into a conditional generator by placing a multi-class classifier in parallel with the discriminator network and backpropagate the classification error through the generator. This technique is versatile enough to be applied to any GAN implementation. The results on two databases and comparisons with other method are provided as well.
Keywords:Conditional deep generators Generative Adversarial Networks Machine learning
With emerge of affordable parallel processing hardware, it became almost impossible to find any aspect of Artificial Intelligence (AI) that Deep Learning (DL) has not been applied to CEmag1 (). DL provides superior outcomes on classification and regression problems compared to classical machine learning methods. The impact of DL is not limited to such problems, but also generative models are taking advantage of these techniques in learning data distribution for big data scenarios where classical methods fail to provide a solution. Generative Adversarial Networks (GAN) GAN () utilise Deep
Neural Network capabilities and are able to estimate
the data distribution for large size problems. These
models comprise two networks, a generator, and a discriminator. The generator makes random samples from
a latent space, and the discriminator determines whether
the sample is adversarial, made by the generator, or
is genuine image coming from the dataset. GANs are
successful implementations of deep generative models,
and there are multiple variations such as WGAN WGAN (),
EBGAN EBGAN (), BEGAN BEGAN (), ACGAN ACGAN (), and DCGAN DCGAN (),
which have evolved from the original GAN by altering
the loss function and/or the network architecture.
Variational Autoencoders (VAE) VAE () are the other successful implementation of deep generative models. In
these models the bottleneck of a conventional autoencoder is considered as the latent space of the generator, i.e., the samples are fed to an autoencoder,
and besides the conventional autoencoderâs loss function, the KullbackLeibler (KL) divergence between the
distribution of the data at the bottleneck is minimized
compared to a Gaussian distribution. In practice, this
is achieved by adding the KL divergence term to the
means square error of the autoencoder network. The
biggest downside to VAE models is their blurry outputs due to the mean square error loss VAE2 ().
PixelRNN and PixelCNN RNN () are other famous implementations of the deep neural generative models. PixelRNN is made of 2-dimensional LSTM units, and in PixelCNN, a Deep Convolutional Neural Network is utilized to estimate the distribution of the data.
Training conditional generators are one of the most appealing applications of GAN. Conditional GAN (CGAN) CGAN () and Auxiliary Classifier GAN (ACGAN) ACGAN () are among the most utilized schemes for this purpose. Wherein the CGAN approach uses the auxiliary class information alongside with partitioning the latent space and ACGAN improves the CGAN idea by introducing a classification loss which back-propagates through the discriminator and generator network. The CGAN method is versatile enough to apply to every variation of GAN. But ACGAN is restricted to a specific loss function which decreases its adaptivity to other GAN varieties.
In VACGAN, the ACGAN technique is extended to be applicable to any GAN implementation for binary problems (2 class scenarios). The technique is known as Versatile Auxiliary Classifier with Generative Adversarial Network (VAC+GAN) and is implemented by placing a classifier in parallel with the discriminator and back-propagate the classification error through the generator alongside the GAN’s loss.
This work expand the original VAC+GAN VACGAN idea to multi-class scenarios. In this approach, the classifier is trained independently from the discriminator which gives the opportunity of applying it to any variation of GAN. The main contribution of VAC+GAN is its versatility, and proofs are provided to show the applicability of the method regardless of the GAN structure or loss functions.
In the next section the VAC+GAN for multi-class scenarios is explained. And in the third section the implementations of the ACGAN and VAC+GAN is presented alongside with the comparisons with other methods. The discussions and future works are given in the last section.
2 Versatile Auxiliary Classifier + Generative Adversarial Network (VAC+GAN)
The concept proposed in this research is to place a classifier network in parallel with the Discriminator. The classifier accepts the samples from the generator, and the classification error is back-propagated through the classifier and the generator. The model structure is shown in figure 1.
In this section it is shown that by placing a classifier at the output of the generator and minimizing the categorical cross-entropy as the classifiers loss, the Jensen-Shannon Divergence between all the classes is increased. The terms used in the mathematical proofs are as follows:
is the number of the classes.
The latent space is partitioned in to subsets. This means that are disjoint and their union is equal to the -space.
is the classifier function.
is the binary cross-entropy loss function.
is the categorical cross-entropy loss function.
In the multiple classes case, the classifier has outputs, where is the number of the classes. In this approach, each output of the classifier corresponds to one class. For a fixed Generator and Discriminator, the optimal output for class (’th output) is:
Considering just one of the outputs of the classifier, the categorical cross-entropy can be reduced to binary cross-entropy given by
which is equal to
By considering we have
The function gets its maximum at for any , concluding the proof. ∎
The maximum value for is and is achieved if and only if .
Minimizing increases the
Jensen-Shannon Divergence between
From equation 6 we have
Which can be rewritten as
Which is equal to
This equation can be rewritten as
wherein the is the Shannon entropy of the distribution .
The Jensen Shannon divergence between distributions , is defined as
Minimizing is increasing the JSD term, concluding the proof. ∎
In this section it has been shown that by placing a classifier at the output of the generator and back-propagate the classification error throughout the generator one can increase the dis-similarity between the classes for generator and therefore train a deep generator that can produce class specified samples. In the next section the proposed idea is implemented for multi-class cases and also compared with state of the art methods.
3 Experimental Results
In this section, two main experiments are explained to show the effectiveness of VAC+GAN. The first one is on MNIST database and visual comparisons with CGAN, CDCGAN and ACGAN is presented. The second experiment is on CFAR10 dataset and the classification error is compared against ACGAN method. All the networks are trained in Lasagne LASAGNE () on top of Theano THEANO () library in Python, unless stated otherwise.
In this experiment, the performance of the proposed method is investigated on MNIST database. MNIST (”Modified National Institute of Standards and Technology”) is known as the ”hello world” dataset of computer vision. It is a historically significant image classification benchmark introduced in 1999, and there has been a considerable amount of research published on MNIST image classification. MNIST contains 60,000 training images and 10,000 test images, both drawn from the same distribution. It consists of pixel images of handwritten digits. Each image is assigned a single truth label digit from .
The proposed method has been applied to the DCGAN scheme. The Generator, Discriminator and the Classifier used in this experiment are given in tables 1, 2 and 3 respectively.
|Hidden 1||Conv||(64 ch)||LeakyR(0.2)|
|Hidden 2||Conv||(128 ch)||LeakyR(0.2)|
|Hidden 1||Conv||(16 ch)||ReLU|
|Pool 1||Max pooling||–|
|Hidden 2||Conv||(8 ch)||ReLU|
|Pool 2||Max pooling||–|
And the loss function for the proposed method (VAC+GAN) is given by:
where, , and are the generator and discriminator losses respectively, is the generator function, is the binary cross-entropy loss for discriminator and is the categorical cross-entropy loss for the classifier. In this experiment, and are equal to 0.2 and 0.8 respectively.
The optimizer used for training the generator and discriminator is ADAM with learning rate, and equal to 0.0002, 0.5 and 0.999 respectively. And the classifier is optimized using nestrov momentum gradient descent with learning rate and momentum equal to 0.01 and 0.9 respectively. The results of the conditional generators trained using Conditional GAN (CGAN)111https://github.com/znxlwm/tensorflow-MNIST-cGAN-cDCGAN, Conditional DCGAN (CDCGAN)222https://github.com/znxlwm/tensorflow-MNIST-cGAN-cDCGAN, ACGAN333https://github.com/buriburisuri/ac-gan, and proposed method (VAC+GAN) on MNIST dataset are shown in figures 2, 3, 4444https://github.com/buriburisuri/ac-gan/blob/master/png/sample.png,and 5 respectively.
As it is shown in these figures the presented method gives superior results compare to CGAN and CDCGAN while using the exact same structure of generator as in CDCGAN. The results are comparable with ACGAN and the difference here is that this method is more versatile and can be applied to any GAN model regardless of model architecture and loss function.
The CFAR10 database CFAR10 () consists of 60000 images in 10 classes wherein 50000 of these images are for training and 10000 for testing purposes. The next experiment is comparing ACGAN555https://github.com/King-Of-Knights/Keras-ACGAN-CIFAR10 to VAC+GAN method on generating images and also the classification accuracy of these methods are compared. Networks utilized in this experiment are shown in tables 4,5 and 6 correspond to generator, discriminator666https://github.com/King-Of-Knights/Keras-ACGAN-CIFAR10/blob/master/cifar10.py and classifier respectively. The same generator and discriminator architectures have been used in both implementations to obtain fair comparisons.
|Hidden 2||DeConv||(192 ch)||ReLU|
|Hidden 3||DeConv||(96 ch)||ReLU|
|MBDisc MBDISC ()||–||–||–|
The loss function used to train the VAC+GAN is given by
where, , and are the generator and discriminator losses respectively, is the generator function, is the binary cross-entropy loss for discriminator and is the categorical cross-entropy loss for the classifier. In this experiment, and are equal to 0.5 and 0.5 respectively.
The optimizer used for training the generator and discriminator is ADAM with learning rate, and equal to 0.0002, 0.5 and 0.999 respectively. And the classifier is optimized using nestrov momentum gradient descent with learning rate and momentum equal to 0.01 and 0.9 respectively. The results for ACGAN and proposed method are shown in figures 6777https://github.com/King-Of-Knights/Keras-ACGAN-CIFAR10/blob/master/plot_epoch_220_generated.png and 7 respectively.
The CFAR10 database is an extremely unconstrained and there are just 10000 samples in each class. Therefore the output of both implementations are vague and in order to compare these methods the classification errors are compared. The confusion matrix for ACGAN and VAC+GAN are shown in figures 8888https://github.com/King-Of-Knights/Keras-ACGAN-CIFAR10/blob/master/Confusion_Matrix.png and 9 respectively.
Confusion matrices show the better classification performed by the VAC+GAN compared to the ACGAN. Classification accuracies for ACGAN and VAC+GAN on CFAR10 are and respectively after 200 epochs. The proposed method gives higher accuracy. The main advantage of the proposed method is the versatility in choosing the proper classifier network while in the ACGAN method the classification task is restrained to discriminator because the discriminator is performing as classifier as well. The VAC+GAN method is versatile in choosing the GAN scheme as well. It can be applied to any GAN implementation just by placing a classifier in parallel with discriminator.
4 Discussion and Conclusion
In this work, a new approach introduced to train conditional deep generators. It also has been proven that VAC+GAN is applicable to any GAN framework regardless of the model structure and/or loss function (see Sec 2) for multi class problems. The idea is to place a classifier in parallel to the discriminator network and back-propagate the classification loss through the generator network in the training stage.
It has also been shown that the presented framework increases the Jensen Shannon Divergence (JSD) between classes generated by the deep generator. i.e., the generator can produce more distinct samples for different classes which is desirable.
The results has been compared to the implementation of CGAN, CDCGAN and ACGAN on MNIST dataset and also the comparisons are given on CFAR10 dataset with respect to ACGAN method. The ACGAN gives comparable results, but the main advantage of the proposed method is its versatility in choosing the GAN scheme and also the classifier architecture.
The future work includes applying the method to datasets with larger number of classes and also extend the implementation for bigger size images. The other idea is to apply this method to regression problems
Acknowledgements.This research is funded under the SFI Strategic Partnership Program by Science Foundation Ireland (SFI) and FotoNation Ltd. Project ID: 13/SPP/I2868 on Next Generation Imaging for Smartphone and Embedded Platforms.
- (1) Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks. In: International Conference on Machine Learning, pp. 214–223 (2017)
- (2) Bergstra, J., Bastien, F., Breuleux, O., Lamblin, P., Pascanu, R., Delalleau, O., Desjardins, G., Warde-Farley, D., Goodfellow, I., Bergeron, A., et al.: Theano: Deep learning on gpus with python. In: NIPS 2011, BigLearning Workshop, Granada, Spain, vol. 3. Citeseer (2011)
- (3) Berthelot, D., Schumm, T., Metz, L.: Began: Boundary equilibrium generative adversarial networks. arXiv preprint arXiv:1703.10717 (2017)
- (4) Dieleman, S., SchlÃ¼ter, J., Raffel, C., Olson, E., SÃ¸nderby, S.K., Nouri, D., Maturana, D., Thoma, M., Battenberg, E., Kelly, J., Fauw, J.D., Heilman, M., de Almeida, D.M., McFee, B., Weideman, H., TakÃ¡cs, G., de Rivaz, P., Crall, J., Sanders, G., Rasul, K., Liu, C., French, G., Degrave, J.: Lasagne: First release. (2015). DOI 10.5281/zenodo.27878. URL http://dx.doi.org/10.5281/zenodo.27878
- (5) Frans, K.: Variational autoencoders explained (2016). URL http://kvfrans.com/variational-autoencoders-explained/
- (6) Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Advances in neural information processing systems, pp. 2672–2680 (2014)
- (7) Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013)
- (8) Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images (2009)
- (9) Lemley, J., Bazrafkan, S., Corcoran, P.: Deep learning for consumer devices and services: Pushing the limits for machine learning, artificial intelligence, and computer vision. IEEE Consumer Electronics Magazine 6(2), 48–56 (2017)
- (10) Mirza, M., Osindero, S.: Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014)
- (11) Odena, A., Olah, C., Shlens, J.: Conditional image synthesis with auxiliary classifier gans. arXiv preprint arXiv:1610.09585 (2016)
- (12) Oord, A.v.d., Kalchbrenner, N., Kavukcuoglu, K.: Pixel recurrent neural networks. arXiv preprint arXiv:1601.06759 (2016)
- (13) Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434 (2015)
- (14) Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training gans. In: Advances in Neural Information Processing Systems, pp. 2234–2242 (2016)
- (15) Zhao, J., Mathieu, M., LeCun, Y.: Energy-based generative adversarial network. arXiv preprint arXiv:1609.03126 (2016)