Versatile Auxiliary Classifier with Generative Adversarial Network (VAC+GAN), Multi Class Scenarios
Abstract
Conditional generators learn the data distribution for each class in a multiclass scenario and generate samples for a specific class given the right input from the latent space. In this work, a method known as “Versatile Auxiliary Classifier with Generative Adversarial Network”for multiclass scenarios is presented. In this technique, the Generative Adversarial Networks (GAN)’s generator is turned into a conditional generator by placing a multiclass classifier in parallel with the discriminator network and backpropagate the classification error through the generator. This technique is versatile enough to be applied to any GAN implementation. The results on two databases and comparisons with other method are provided as well.
Keywords:
Conditional deep generators Generative Adversarial Networks Machine learning∎
1 Introduction
With emerge of affordable parallel processing hardware, it became almost impossible to find any aspect of Artificial Intelligence (AI) that Deep Learning (DL) has not been applied to CEmag1 (). DL provides superior outcomes on classification and regression problems compared to classical machine learning methods. The impact of DL is not limited to such problems, but also generative models are taking advantage of these techniques in learning data distribution for big data scenarios where classical methods fail to provide a solution. Generative Adversarial Networks (GAN) GAN () utilise Deep
Neural Network capabilities and are able to estimate
the data distribution for large size problems. These
models comprise two networks, a generator, and a discriminator. The generator makes random samples from
a latent space, and the discriminator determines whether
the sample is adversarial, made by the generator, or
is genuine image coming from the dataset. GANs are
successful implementations of deep generative models,
and there are multiple variations such as WGAN WGAN (),
EBGAN EBGAN (), BEGAN BEGAN (), ACGAN ACGAN (), and DCGAN DCGAN (),
which have evolved from the original GAN by altering
the loss function and/or the network architecture.
Variational Autoencoders (VAE) VAE () are the other successful implementation of deep generative models. In
these models the bottleneck of a conventional autoencoder is considered as the latent space of the generator, i.e., the samples are fed to an autoencoder,
and besides the conventional autoencoderâs loss function, the KullbackLeibler (KL) divergence between the
distribution of the data at the bottleneck is minimized
compared to a Gaussian distribution. In practice, this
is achieved by adding the KL divergence term to the
means square error of the autoencoder network. The
biggest downside to VAE models is their blurry outputs due to the mean square error loss VAE2 ().
PixelRNN and PixelCNN RNN () are other famous implementations of the deep neural generative models. PixelRNN is made of 2dimensional LSTM units, and in PixelCNN, a Deep Convolutional Neural Network is utilized to estimate the distribution of the data.
Training conditional generators are one of the most appealing applications of GAN. Conditional GAN (CGAN) CGAN () and Auxiliary Classifier GAN (ACGAN) ACGAN () are among the most utilized schemes for this purpose. Wherein the CGAN approach uses the auxiliary class information alongside with partitioning the latent space and ACGAN improves the CGAN idea by introducing a classification loss which backpropagates through the discriminator and generator network. The CGAN method is versatile enough to apply to every variation of GAN. But ACGAN is restricted to a specific loss function which decreases its adaptivity to other GAN varieties.
In VACGAN, the ACGAN technique is extended to be applicable to any GAN implementation for binary problems (2 class scenarios). The technique is known as Versatile Auxiliary Classifier with Generative Adversarial Network (VAC+GAN) and is implemented by placing a classifier in parallel with the discriminator and backpropagate the classification error through the generator alongside the GAN’s loss.
This work expand the original VAC+GAN VACGAN idea to multiclass scenarios. In this approach, the classifier is trained independently from the discriminator which gives the opportunity of applying it to any variation of GAN. The main contribution of VAC+GAN is its versatility, and proofs are provided to show the applicability of the method regardless of the GAN structure or loss functions.
In the next section the VAC+GAN for multiclass scenarios is explained. And in the third section the implementations of the ACGAN and VAC+GAN is presented alongside with the comparisons with other methods. The discussions and future works are given in the last section.
2 Versatile Auxiliary Classifier + Generative Adversarial Network (VAC+GAN)
The concept proposed in this research is to place a classifier network in parallel with the Discriminator. The classifier accepts the samples from the generator, and the classification error is backpropagated through the classifier and the generator. The model structure is shown in figure 1.
In this section it is shown that by placing a classifier at the output of the generator and minimizing the categorical crossentropy as the classifiers loss, the JensenShannon Divergence between all the classes is increased. The terms used in the mathematical proofs are as follows:

is the number of the classes.

The latent space is partitioned in to subsets. This means that are disjoint and their union is equal to the space.

is the classifier function.

is the binary crossentropy loss function.

is the categorical crossentropy loss function.
Proposition 1.
In the multiple classes case, the classifier has outputs, where is the number of the classes. In this approach, each output of the classifier corresponds to one class. For a fixed Generator and Discriminator, the optimal output for class (’th output) is:
(1) 
Proof.
Considering just one of the outputs of the classifier, the categorical crossentropy can be reduced to binary crossentropy given by
(2) 
which is equal to
(3) 
By considering we have
(4) 
The function gets its maximum at for any , concluding the proof. ∎
Theorem 2.1.
The maximum value for is and is achieved if and only if .
Proof.
Theorem 2.2.
Minimizing increases the
JensenShannon Divergence between
Proof.
From equation 6 we have
(8) 
Which can be rewritten as
(9) 
Which is equal to
(10) 
This equation can be rewritten as
(11) 
wherein the is the Shannon entropy of the distribution .
The Jensen Shannon divergence between distributions , is defined as
(12) 
From equations 11 and 12 we have
(13) 
Minimizing is increasing the JSD term, concluding the proof. ∎
In this section it has been shown that by placing a classifier at the output of the generator and backpropagate the classification error throughout the generator one can increase the dissimilarity between the classes for generator and therefore train a deep generator that can produce class specified samples. In the next section the proposed idea is implemented for multiclass cases and also compared with state of the art methods.
3 Experimental Results
In this section, two main experiments are explained to show the effectiveness of VAC+GAN. The first one is on MNIST database and visual comparisons with CGAN, CDCGAN and ACGAN is presented. The second experiment is on CFAR10 dataset and the classification error is compared against ACGAN method. All the networks are trained in Lasagne LASAGNE () on top of Theano THEANO () library in Python, unless stated otherwise.
3.1 Mnist
In this experiment, the performance of the proposed method is investigated on MNIST database. MNIST (”Modified National Institute of Standards and Technology”) is known as the ”hello world” dataset of computer vision. It is a historically significant image classification benchmark introduced in 1999, and there has been a considerable amount of research published on MNIST image classification. MNIST contains 60,000 training images and 10,000 test images, both drawn from the same distribution. It consists of pixel images of handwritten digits. Each image is assigned a single truth label digit from .
The proposed method has been applied to the DCGAN scheme. The Generator, Discriminator and the Classifier used in this experiment are given in tables 1, 2 and 3 respectively.
Layer  Type  kernel  Activation 

Input  Input  –  – 
Hidden 1  Dense  ReLU  
BatchNorm 1  –  –  – 
Hidden 2  Dense  ReLU  
BathNorm 2  –  –  – 
Hidden 3  Deconv  (64ch)  ReLU 
BathNorm 3  –  –  – 
Output  Deconv  (1ch)  Sigmoid 
Layer  Type  kernel  Activation 

Input  Input  –  – 
Hidden 1  Conv  (64 ch)  LeakyR(0.2) 
BatchNorm 1  –  –  – 
Hidden 2  Conv  (128 ch)  LeakyR(0.2) 
BathNorm 2  –  –  – 
Hidden 3  Dense  1024  LeakyR(0.2) 
Output  Dense  1  Sigmoid 
Layer  Type  Kernel  Activation 

Input  Input  –  – 
Hidden 1  Conv  (16 ch)  ReLU 
Pool 1  Max pooling  –  
Hidden 2  Conv  (8 ch)  ReLU 
Pool 2  Max pooling  –  
Hidden 3  Dense  1024  ReLU 
Output  Dense  10  Softmax 
And the loss function for the proposed method (VAC+GAN) is given by:
(14) 
where, , and are the generator and discriminator losses respectively, is the generator function, is the binary crossentropy loss for discriminator and is the categorical crossentropy loss for the classifier. In this experiment, and are equal to 0.2 and 0.8 respectively.
The optimizer used for training the generator and discriminator is ADAM with learning rate, and equal to 0.0002, 0.5 and 0.999 respectively. And the classifier is optimized using nestrov momentum gradient descent with learning rate and momentum equal to 0.01 and 0.9 respectively.
The results of the conditional generators trained using Conditional GAN (CGAN)^{1}^{1}1https://github.com/znxlwm/tensorflowMNISTcGANcDCGAN, Conditional DCGAN (CDCGAN)^{2}^{2}2https://github.com/znxlwm/tensorflowMNISTcGANcDCGAN, ACGAN^{3}^{3}3https://github.com/buriburisuri/acgan, and proposed method (VAC+GAN) on MNIST dataset are shown in figures 2, 3, 4^{4}^{4}4https://github.com/buriburisuri/acgan/blob/master/png/sample.png,and 5 respectively.
As it is shown in these figures the presented method gives superior results compare to CGAN and CDCGAN while using the exact same structure of generator as in CDCGAN. The results are comparable with ACGAN and the difference here is that this method is more versatile and can be applied to any GAN model regardless of model architecture and loss function.
3.2 Cfar10
The CFAR10 database CFAR10 () consists of 60000 images in 10 classes wherein 50000 of these images are for training and 10000 for testing purposes. The next experiment is comparing ACGAN^{5}^{5}5https://github.com/KingOfKnights/KerasACGANCIFAR10 to VAC+GAN method on generating images and also the classification accuracy of these methods are compared. Networks utilized in this experiment are shown in tables 4,5 and 6 correspond to generator, discriminator^{6}^{6}6https://github.com/KingOfKnights/KerasACGANCIFAR10/blob/master/cifar10.py and classifier respectively. The same generator and discriminator architectures have been used in both implementations to obtain fair comparisons.
Layer  Type  kernel  Activation 
Input  Input  –  – 
Hidden 1  Dense  ReLU  
Reshape  Reshape  384ch  – 
Hidden 2  DeConv  (192 ch)  ReLU 
BathNorm 2  –  –  – 
Hidden 3  DeConv  (96 ch)  ReLU 
BathNorm 3  –  –  – 
Output  DeConv  (3 ch)  tanh 
Layer  Type  kernel  Activation 
Input  Input  –  
Gaussian  Noise  –  
Hidden 1  Conv  16ch  LeakyR(0.2) 
DropOut 1  DropOut  –  
Hidden 2  Conv  32ch  LeakyR(0.2) 
BathNorm 1  –  –  – 
DropOut 2  DropOut  –  
Hidden 3  Conv  64ch  LeakyR(0.2) 
BathNorm 2  –  –  – 
DropOut 3  DropOut  –  
Hidden 4  Conv  128ch  LeakyR(0.2) 
BathNorm 3  –  –  – 
DropOut 4  DropOut  –  
Hidden 5  Conv  256ch  LeakyR(0.2) 
BathNorm 4  –  –  – 
DropOut 5  DropOut  –  
Hidden 6  Conv  512ch  LeakyR(0.2) 
BathNorm 5  –  –  – 
DropOut 6  DropOut  –  
MBDisc MBDISC ()  –  –  – 
Output  Dense  1  sigmoid 
Layer  Type  kernel  Activation 

Input  Input  –  
Hidden 1  Conv  (128ch)  ReLU 
BatchNorm 1  –  –  – 
MaxPool 1  MaxPool  (2,2)  – 
Hidden 2  Conv  (256ch)  ReLU 
BatchNorm 2  –  –  – 
MaxPool 2  MaxPool  (2,2)  – 
Hidden 3  Conv  (512ch)  ReLU 
BatchNorm 3  –  –  – 
MaxPool 3  MaxPool  (2,2)  – 
Hidden 4  Dense  512  ReLU 
Output  Dense  10  softmax 
The loss function used to train the VAC+GAN is given by
(15) 
where, , and are the generator and discriminator losses respectively, is the generator function, is the binary crossentropy loss for discriminator and is the categorical crossentropy loss for the classifier. In this experiment, and are equal to 0.5 and 0.5 respectively.
The optimizer used for training the generator and discriminator is ADAM with learning rate, and equal to 0.0002, 0.5 and 0.999 respectively. And the classifier is optimized using nestrov momentum gradient descent with learning rate and momentum equal to 0.01 and 0.9 respectively. The results for ACGAN and proposed method are shown in figures 6^{7}^{7}7https://github.com/KingOfKnights/KerasACGANCIFAR10/blob/master/plot_epoch_220_generated.png and 7 respectively.
The CFAR10 database is an extremely unconstrained and there are just 10000 samples in each class. Therefore the output of both implementations are vague and in order to compare these methods the classification errors are compared. The confusion matrix for ACGAN and VAC+GAN are shown in figures 8^{8}^{8}8https://github.com/KingOfKnights/KerasACGANCIFAR10/blob/master/Confusion_Matrix.png and 9 respectively.
Confusion matrices show the better classification performed by the VAC+GAN compared to the ACGAN. Classification accuracies for ACGAN and VAC+GAN on CFAR10 are and respectively after 200 epochs. The proposed method gives higher accuracy. The main advantage of the proposed method is the versatility in choosing the proper classifier network while in the ACGAN method the classification task is restrained to discriminator because the discriminator is performing as classifier as well. The VAC+GAN method is versatile in choosing the GAN scheme as well. It can be applied to any GAN implementation just by placing a classifier in parallel with discriminator.
4 Discussion and Conclusion
In this work, a new approach introduced to train conditional deep generators. It also has been proven that VAC+GAN is applicable to any GAN framework regardless of the model structure and/or loss function (see Sec 2) for multi class problems. The idea is to place a classifier in parallel to the discriminator network and backpropagate the classification loss through the generator network in the training stage.
It has also been shown that the presented framework increases the Jensen Shannon Divergence (JSD) between classes generated by the deep generator. i.e., the generator can produce more distinct samples for different classes which is desirable.
The results has been compared to the implementation of CGAN, CDCGAN and ACGAN on MNIST dataset and also the comparisons are given on CFAR10 dataset with respect to ACGAN method. The ACGAN gives comparable results, but the main advantage of the proposed method is its versatility in choosing the GAN scheme and also the classifier architecture.
The future work includes applying the method to datasets with larger number of classes and also extend the implementation for bigger size images. The other idea is to apply this method to regression problems
Acknowledgements.
This research is funded under the SFI Strategic Partnership Program by Science Foundation Ireland (SFI) and FotoNation Ltd. Project ID: 13/SPP/I2868 on Next Generation Imaging for Smartphone and Embedded Platforms.References
 (1) Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks. In: International Conference on Machine Learning, pp. 214–223 (2017)
 (2) Bergstra, J., Bastien, F., Breuleux, O., Lamblin, P., Pascanu, R., Delalleau, O., Desjardins, G., WardeFarley, D., Goodfellow, I., Bergeron, A., et al.: Theano: Deep learning on gpus with python. In: NIPS 2011, BigLearning Workshop, Granada, Spain, vol. 3. Citeseer (2011)
 (3) Berthelot, D., Schumm, T., Metz, L.: Began: Boundary equilibrium generative adversarial networks. arXiv preprint arXiv:1703.10717 (2017)
 (4) Dieleman, S., SchlÃ¼ter, J., Raffel, C., Olson, E., SÃ¸nderby, S.K., Nouri, D., Maturana, D., Thoma, M., Battenberg, E., Kelly, J., Fauw, J.D., Heilman, M., de Almeida, D.M., McFee, B., Weideman, H., TakÃ¡cs, G., de Rivaz, P., Crall, J., Sanders, G., Rasul, K., Liu, C., French, G., Degrave, J.: Lasagne: First release. (2015). DOI 10.5281/zenodo.27878. URL http://dx.doi.org/10.5281/zenodo.27878
 (5) Frans, K.: Variational autoencoders explained (2016). URL http://kvfrans.com/variationalautoencodersexplained/
 (6) Goodfellow, I., PougetAbadie, J., Mirza, M., Xu, B., WardeFarley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Advances in neural information processing systems, pp. 2672–2680 (2014)
 (7) Kingma, D.P., Welling, M.: Autoencoding variational bayes. arXiv preprint arXiv:1312.6114 (2013)
 (8) Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images (2009)
 (9) Lemley, J., Bazrafkan, S., Corcoran, P.: Deep learning for consumer devices and services: Pushing the limits for machine learning, artificial intelligence, and computer vision. IEEE Consumer Electronics Magazine 6(2), 48–56 (2017)
 (10) Mirza, M., Osindero, S.: Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014)
 (11) Odena, A., Olah, C., Shlens, J.: Conditional image synthesis with auxiliary classifier gans. arXiv preprint arXiv:1610.09585 (2016)
 (12) Oord, A.v.d., Kalchbrenner, N., Kavukcuoglu, K.: Pixel recurrent neural networks. arXiv preprint arXiv:1601.06759 (2016)
 (13) Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434 (2015)
 (14) Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training gans. In: Advances in Neural Information Processing Systems, pp. 2234–2242 (2016)
 (15) Zhao, J., Mathieu, M., LeCun, Y.: Energybased generative adversarial network. arXiv preprint arXiv:1609.03126 (2016)