Towards Deeper Generative Architectures for GANs using Dense connections
In this paper, we present the result of adopting skip connections and dense layers, previously used in image classification tasks, in the Fisher GAN implementation. We have experimented with different numbers of layers and inserting these connections in different sections of the network. Our findings suggests that networks implemented with the connections produce better images than the baseline, and the number of connections added has only slight effect on the result.
Towards Deeper Generative Architectures for GANs using Dense connections
Samarth Tripathi Department of Computer Science Columbia University New York, NY 10027 email@example.com Renbo Tu Department of Computer Science Columbia University New York, NY 10027 firstname.lastname@example.org
noticebox[b]31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.\end@float
Recent developments in Generative Adversarial Networks (GAN) have allowed for production of high-quality, quasi-natural images. In this research, we seek to develop another variation of GAN to generate such semi-realistic images. Considering that significant progress also takes place in the image classification field, specifically with DenseNet achieving state-of-the-art performance, we have decided to leverage the strength of these networks to improve GAN performance. Our model adopts the DenseNet/ResNet architecture as the discriminator, and the generator includes the skip connections found in these classification networks. We expect these connections to enable the generator to learn more complex features of the image, as skip connections/dense connections have proved crucial for extracting features in classification tasks. On the other hand, since converging has proved a difficulty for GANs, we used Fisher GAN as our base model to improve convergence performances. We have implemented the models for two datasets: CelebA and Cifar10.
In this paper, we make the following contributions:
We use the newly constructed model for generating higher quality images based on common evaluation metrics such as inception score.
We analyze the effect of skip connections in generating images through comparing models with different dense architectures.
We investigate in the optimal training method for these GANs with dense connections.
2 Related Works
Fisher GAN: As GANs have proven to be unstable during training, attaining convergence for the generator has been one of the primary difficulties for researchers. The newly developed Fisher GAN (Mroueh & Sercu, 2017 ) defines a critic with a data dependent constraint on its second order moments. The new algorithm based on the Augmented Lagrangian, incorporated in a DCGAN (Radford et. al., 2016 ) network, achieved decent performance in terms of the semi-supervised learning metric and generated good samples.
DenseNet: DenseNet (Huang et.al., 2016 [DBLP:journals/corr/HuangLW16a]) is a deep convolutional network with skip connections between each layer to every other layer in a feed-forward fashion. Traditional ResNet (He et.al., 2015 ) has showed that residual layers with skip connections can learn high-level features more accurately and gain accuracy with increased depth. The DenseNet implementation successfully addresses the vanishing gradient problem, strengthens feature learning, and reduces the number of parameters through bottleneck layers. DenseNet obtained state-of-the-art performance on classification tasks with multiple popular datasets.
Inception Score: Salimans et al. (2016 ) proposes a widely used metric, namely "Inception Score" for evaluating GAN-generated images. The procedure starts with feeding images to an pretrained inception model to obtain conditional model distribution p(y|x). A critic then evaluates this distribution to check if the images contain meaningful objects, which is represented by low entropy. Another objective of the critic judges whether the generated images are varied through evaluating marginal \int p(y|x=G(z)), which should have high entropy. This metric is used in conjunction with human judgment (visual inspection).
3 Experimental Setup
We first discuss the DCGAN structure we try to improve upon. The architecture is presented in figure 1, which consists of log2 image size layers (6 layers and 5 layer for image size of 64 and 32) respectively of Convolution (and Convolution-Transpose) layers of 4*4 size with a stride of 2 and padding of 1. These "essential" layers perform dimensionality reduction (and expansion) by a factor of 2 and are indispensable for the models. Apart from these the model also contains “extra layers”, which perform basic Convolution (and Convolution-Transpose) layers of 3*3 size with a stride of 1 and padding of 1. FisherGAN uses 2 extra layers each for Discriminator and Generator. The layers also use Batchnormalization as a precursor to each convolutional layer, followed by ReLu for Generators and LeakyReLu for Discriminator. There are further many hyperparameters and constraints mentioned in the FisherGAN paper that we replicate for both CelebA and Cifar10.
For our research we only experiment adding residual and dense connections to the Generator model, and don’t interfere with discriminator. The reason for this approach is two fold. Firstly, it becomes very difficult to find the correct hyperparameters and constraints to effectively train these models, which becomes increasing difficult as we increase the number of layers in a generic manner. Secondly, we do not notice improvements and only find the performance to deteriorate compared to the baseline FisherGan. To improve upon the DCGAN structure, we first add Residual connections between layers in Generative models, while keeping the discriminator constant.This allows us to train the GAN using the same hyperparameters and constraints as DCGAN while allowing deeper and more powerful generators. We add 5 Transpose-Convolution layers like extra layers, but with residual connections between DCGAN layers in the form of Figure 1. These residual layers keep the dimensionality constant which allows us to concatenate the feature maps before sending them to the next layer.
Next we try to add more layers and make the generators deeper in a Densely Connected manner. We achieve this by adding more Transpose-Convolution which takes as input all the previous layer’s output of the same dimensionality and outputs the concatenated feature maps of all inputs and its outputs to the next layer. As we go deeper, adding more layers results in an exponential increase in feature maps being passed on the next layer, which increases computation and decreases performance. We avoid this by decreasing the number of activation maps from each subsequent dense layer by a factor of two after the first dense layer. This limits both the number of layer to which we can extend and also decreases the number of activation maps that get forwarded to the next layer.
Our architectures include the baseline Generator(1), Generator with 3 residual connections(2), Generator with 6 Dense Connections(3), Generator with 9 Dense Connections(4), and Generator with 9 Dense connections with decreasing activation maps for Cifar10(5). For CelebA we use a similar architecture but with 4,8,12 and 16 connections. We also try many other variations that did not yield better quality images. As mentioned before naively adding more dense layers ends up deteriorating the performance. We also tried variations of adding 1D ConvolutionTranspose layer after our dense layer and before our essential layers to reduce parameters and add deeper convolutions but it did not improve result quality or was not able to train. We also tried mirroring the Discriminator with dense architectures like the generator network, but we could not effectively train it.
4 Results and Discussions
Inception Score Metrics
As previously discussed, inception score has been crucial for result analysis in this research, as it offers a quantitative assessment of the images, both in their diversity and their realistic qualities. Over the course of the experiment, we have made minor tweaks to the network. The subtle change in the samples, which is not apparent from visual inspection, can be identified when inception score is plotted.
Figure 2 shows Inception score for Cifar10. As we can see from the results, all the Dense models get better scores, and achieve better and faster convergence.
We can visually inspect the quality of generated images as well, as shown in Figure 4.
We like to thank Dr. Hod Lipson, Professor Columbia University and Oscar Chang, Columbia University, for their continued and expert guidance without which this work would not have been completed.
-  Youssef Mroueh and Tom Sercu. Fisher GAN. CoRR, abs/1705.09675, 2017.
-  Alec Radford, Luke Metz, and Soumith Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. CoRR, abs/1511.06434, 2015.
-  Gao Huang, Zhuang Liu, and Kilian Q. Weinberger. Densely connected convolutional networks. CoRR, abs/1608.06993, 2016.
-  Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. CoRR, abs/1512.03385, 2015.
-  Tim Salimans, Ian J. Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. Improved techniques for training gans. CoRR, abs/1606.03498, 2016.