High Quality Bidirectional
Generative Adversarial Networks
Generative adversarial networks (GANs) have achieved outstanding success in generating the high quality data. Focusing on the generation process, existing GANs investigate unidirectional mapping from the latent vector to the data. Later, various studies point out that the latent space of GANs is semantically meaningful and can be utilized in advanced data analysis and manipulation. In order to analyze the real data in the latent space of GANs, it is necessary to investigate the inverse generation mapping from the data to the latent vector. To tackle this problem, the bidirectional generative models introduce an encoder to enable the inverse path of generation process. Unfortunately, this effort leads to the degradation of generation quality because the imperfect generator rather interferes the encoder training and vice versa. In this paper, we propose a new inference model that estimates the latent vector from the feature of GAN discriminator. While existing bidirectional models learns the image to latent translation, our algorithm formulates this inference mapping by the feature to latent translation. It is important to note that training of our model is independent of the GAN training. Owing to the attractive nature of this independency, the proposed algorithm can generate the high quality samples identical to those of unidirectional GANs and also reconstruct the original data faithfully. Moreover, our algorithm can be employed to any unidirectional GAN, even the pre-traind GANs.
High Quality Bidirectional
Generative Adversarial Networks
Duhyun Bang Hyunjung Shim Yonsei Institute of Convergence Technology, School of Integrated Technology Yonsei University, Incheon, South Korea kateshim,firstname.lastname@example.org
noticebox[b]Preprint. Work in progress.\end@float
Generative adversarial networks (GANs) have reported a remarkable progress for successfully reproducing the real data distributions, particularly natural images. GANs imposes few constraint or assumption on their model definition, even without the variational bound, but it is capable of producing sharp and realistic images. Instead, training the GANs involves the adversarial competition between a generator and a discriminator; the generator learns the generation process that maps the latent distribution to the data distribution , and the discriminator evaluates the generation quality by distinguishing generated images from real images. Goodfellow et al. [Goodfellow et al., 2014] formulate the objective of this adversarial training using the following minimax game:
where denotes expectation, and are the generator and the discriminator respectively, and and are samples drawn from and respectively. Once the generator learns the mapping from the latent to the data distribution, it is possible to generate arbitrary data corresponding to randomly drawn . Because the generator network never observes the real data directly during training, it does not memorize the training dataset, thus produces unseen data.
Since this pioneer work, various approaches to GANs have been developed to improve the training stability, the image quality, or the diversity of generation process. Those of traditional GANs focus on learning the unidirectional mapping from to . That is, the mutual relationship between the latent and the data distributions is not addressed in this unidirectional mapping. Recently, various studies find that the latent space of GANs derives the semantically meaningful representation [Mikolov et al., 2013]. Benefit from its semantic power, several studies [Radford et al., 2015, Berthelot et al., 2017] show that it is possible to utilize the latent space of GANs for data augmentation or image editing. To further understand and interpret the semantic representation for the latent space of GANs, we should investigate the inference mapping from to . Learning this inference mapping can be formulated as the data reconstruction by the latent estimation. BEGAN [Berthelot et al., 2017] made a first attempt to solve the inverse mapping from to using non-convex optimization. More specifically, the problem can be defined as , where is the distance metric. They used this inverse mapping to prove that the generated images are not the result of data memorization. It is important to note that this optimization aims to find the inverse generation path using the non-convex optimizer. Because of the non-linearity and model complexity of generator, calculating the inverse path suffers from multiple local minima, thus hard to reach at the optimal solution. Also, it is impractical due to its computational complexity.
Recent studies pay attention to simultaneously learning the inference (i.e., from to ) and the generation path (i.e., from to ). The core idea of these approaches is to employ an encoder to the GAN models for achieving the bidirectional mapping between and . These studies can be categorized into two folds. 1) One imposes that the input data should be identical to the generated image from its estimated latent vector. This is also referred to as the reconstruction loss, and used for handling a mode collapse problem. 2) The other proposes novel GAN frameworks for establishing the bidirectional mapping between and jointly learnt by adversarial training. The former case should combine two semantically different loss terms; one is the distance loss in the data domain and the other is the adversarial loss. Note that finding the optimal balance between two terms is difficult, and often results in training instability. Moreover, the distance loss in the data domain leads image blurs. It is because sample averages can minimize the overall reconstruction loss. The later case commonly suffers from the lack of generation quality. To understand their performance limitation, we should focus on the situation of intermediate training. In the middle of training, both the encoder and the generator are imperfect. The invalid inference mapping caused by the imperfect encoder misleads the generator update. Likewise, the imperfect generator also degrades the encoder. Due to the bidirectional error propagation, their results present relatively poor quality in both generation and reconstruction compared to the unidirectional GANs.
Variational Autoencoder (VAE) Kingma and Welling  and Adversarial Autoencoder (AAE) Makhzani et al.  are the most representative generative models that explicitly learn the bidirectional mapping between and . Their model architectures are quite similar to the structure of autoencoder Baldi , composed of encoder (i.e., the inverse generator) and decoder (i.e., the generator). Unlike autoencoder, VAE and AAE enforce to match the latent distributions to prior distributions, thus enabling the data generation. While VAE utilizes KL divergence to match the latent to the target distribution, AAE utilizes the adversarial learning for distribution matching. Although both algorithms guarantee the bidirectional mapping between the latent and the data distribution as well as training stability, their image quality is relatively worse than those of unidirectional GANs.
In this paper, we propose an effective algorithm to establish the bidirectional mapping without scarifying the image quality accomplished by the unidirectional GANs. Our idea is motivated by the fact that the discriminator of GANs can serve as a meaningful feature extractor because it is sufficiently trained to distinguish the real and fake images [Radford et al., 2015]. In other words, the feature vector extracted from the discriminator (i.e., a discriminative feature vector) can map the generated image to the sufficiently meaningful feature space so that the feature is aggregated to vote for the quality evaluation, deciding either real or fake. Our goal is to build the mapping function from the feature vector extracted from the discriminator (i.e. discriminative feature vector) to corresponding .
Toward this goal, we introduce namely a connection network to build such a mapping. The connection network is associated only with the mapping from the discriminative feature vector to the latent vector, influence neither generator nor discriminator. This is possible because training of connection network is conducted after the generation mapping is established. In this way, we can guarantee the image quality of unidirectional GANs. Moreover, the connection network effectively performs the feature-to-latent translation in a much lower dimensional space than the image-to-latent translation of encoder-based approaches. This helps reduce a number of model parameters and lead the efficient training. In addition, training the connection network can fully utilize continuous samples from . Theoretically, we can draw infinite numbers of latent vectors following and obtain the corresponding discriminative feature vector of the generated data. This eliminates any issue on the lack or collection of training dataset. Because our training dataset is theoretically infinite, pairs of and the discriminative feature vector for the real data are a subset of our training dataset. As a result, we can ensure that the mapping of real data to the latent vector once the connection network is successfully trained.
Finally, the advantage of our algorithm can be summarized as follows.
Our algorithm learns the inference mapping independently of GAN training. As a result, our algorithm accomplishes bidirectional GANs without losing the image quality of unidirectional GANs.
The connection network can be efficiently trained with the simple network and reduced parameters compared to existing approaches.
Our connection network can be easily extendable to any unidirectional GANs.
2 Related work
Existing generative models for learning the bidirectional mapping can be categorized into two groups. The first group utilizes the structure of an autoencoder and the other group develops the joint training scheme for learning bidirectional mapping.
2.1 Encoder-decoder architecture
Similar to the architecture of autoencoder, VAE [Kingma and Welling, 2013] consists of the encoder for transforming the input data to the latent vector and the decoder for reconstructing the input data from the latent representation. While the autoencoder is designed only for the dimensionality reduction, VAE is capable of generating samples as well. It is possible because VAE enforces the latent distribution to match the prior distribution (e.g., ) by utilizing the variational inference [Wainwright et al., 2008]. Because the decoder learns a mapping from the prior distribution to the data distribution during training, the trained decoder serves as the generator. Unfortunately, because the objective of decoder is to minimize the reconstruction errors, the generated images from VAE often exhibit blurs and the lack of diversity; generated images are similar to the training dataset. Still, VAE is an attractive generative model due to its efficient and stable training. Later, VAEGAN [Larsen et al., 2015] has been introduced to address image blurs arose in VAE. For that, authors combine the objective of VAE with an adversarial loss. Furthermore, they replace the pixel matching loss with the feature matching loss for visual fidelity. However, because the two loss measures they adopt are semantically different, optimizing its balance is another challenge and often leads to training instability.
2.2 Bidirectional GANs
ALI [Dumoulin et al., 2016] and BiGAN [Donahue et al., 2016] suggest a new theoretical framework for training the bidirectional GANs, that are jointly learn the bidirectional mapping between and in a unsupervised manner. They use the generator similar to unidirectional GANs [Radford et al., 2015, Mao et al., 2017, Gulrajani et al., 2017, Warde-Farley and Bengio, 2017] for constructing the forward mapping from to , and then use the encoder to model the inference mapping from to . To train the generator and the encoder simultaneously, they define a new objective function for the discriminator, which distinguishes the joint distribution of from that of . Note that and represent the mapping functions defined by the encoder and the generator, respectively. Although their models can reconstruct the original image from the estimated latent variable, the visual quality of the generation is generally worse than the unidirectional GANs. It is because the imperfect generator during training misleads the training of the encoder and vice versa. Because of the limited generator, the reconstruction quality is also limited; it is not faithful to preserving the characteristics of the original image.
MDGAN [Che et al., 2016] and VEEGAN [Srivastava et al., 2017] also investigate the inference mapping using an encoder. Unlike ALI and BiGAN, they introduce an additional constraint, the reconstruction loss, which enforces the reconstructed images from the estimated latent variable to be identical to the original image. Unfortunately, as pointed in VAEGAN, combining the reconstruction loss and adversarial loss is not trivial and often introduces training instability. To address this issue of training instability, MDGAN uses two discriminators for alternatively training the generator. In the mode regularization step, the first discriminator distinguishes from in order to enforce the reconstruction loss to the generator training. Next, in the diffuse step, the second discriminator distinguishes from to impose the adversarial loss to the generator training (i.e., the generator produces realistic images). VEEGAN adopts the idea of reconstruction loss in the latent space analogous to VAEGAN. As a result, VEEGAN subsides image blurs both in the reconstructed and generated images.
3 Proposed algorithm
We propose a simple yet powerful algorithm to learn the bidirectional mapping based on the existing unidirectional GANs.Our main contributions are 1) to preserve the visual quality of the unidirectional GANs unlike existing bidirectional GANs, and 2) to reduce the computational complexity for practical applications.
To accomplish the high quality image generation, we split the connection network training from the unidirectional GAN training. This choice allows us to establish the inference mapping without sacrificing the image quality. Furthermore, as discussed in previous study [Srivastava et al., 2017], minimizing the differences in feature domain effectively avoids image blurs, which is often a problem in pixel difference in image domain. (e.g. a reconstruction error)
To achieve the computational efficiency, we replace the encoder network with the connection network. Our connection network transfers the discriminative feature extracted from the discriminator to the latent vector. Because our problem is defined as the low-to-low vector translation, we effectively gain the computational efficiency.
The objective for learning the connection network is represented as follows:
where is the connection network, indicates the discriminative feature vector. Note that the generator and discriminator are fixed while the connection network is updated.
3.1 Connection network
Our algorithm is inspired by the previous study that justifies that the discriminator of GANs learns the hierarchy of features so to distinguish the real and the fake images [Radford et al., 2015]. That is, the discriminator can serve as a feature extractor as long as it is sufficiently trained. Focusing on the discriminative feature space determined by the discriminator, when the GAN training reaches to Nash equilibrium (i.e., the discriminator no longer distinguishes the real and fake images), the feature distribution of real images is identical to that of fake images.
Based on this analogy, we propose the connection network that learns the mapping from to the discriminative feature from the discriminator. For the sake of understanding, Fig 1. visualizes the graphical models of the existing bidirectional GAN model (ALI/ BiGAN) and the proposed model. The generated image from is projected onto the discriminative feature space, and then this feature vector maps to the original using the connection network. It is important to understand that the correspondences between and the discriminative features are automatically determined for any random variable once the generator and the discriminator are trained. Because we can draw the infinite number of samples from the distribution , the training data (i.e., a set of and its discriminative feature vector pair) are also unlimited. In theory, the amount of our training samples approaches to the infinity, thus our training dataset is a superset of real dataset. Therefore, although the connection network is trained only with generated samples, we can establish the mapping between the real data and its latent vectors successfully.
3.2 Training and test efficiency
As mentioned above, the unidirectional GAN can estimate the latent vectors of the target image without training an additional network. However, the non-convex optimization for estimating latent vectors of target images is often intractable, requiring expensive computational resources.
Both bidirectional GANs or the encoder-decoder approaches achieve the computational efficiency during testing by introducing the encoder network during training. However, their test efficiency is the result of passing the computational burden during the training. Because the encoder should model the mapping from the high dimensional space to the low dimensional space, its network requires a lot of parameters and a considerable computation for training.
The proposed algorithm accomplishes both training and test efficiency. Utilizing a discriminative feature of discriminator, our problem is to induce the mapping from the low to the low dimensional representations. In this way, we significantly reduce the model parameters as well as computational complexity during training.
Training our connection network is completely independent of GANs training. This is why our bidirectional GANs can guarantee the image quality of unidirectional GANs. Owing to its simple network architecture, the connection network can be employed to any unidirectional GAN.Our network can be learnt without real dataset because the training pairs can be generated by the forward propagation of GAN model. Another attractive nature of our algorithm is that we can utilize the pre-trained GANs. By solely training the connection network, we can establish the bidirectional GANs based on the any baseline unidirectional GANs.
4 Experimental evaluation
In this section, we evaluate our bidirectional GANs for how accurately our algorithm reconstruct the original images both qualitatively and quantitatively. We use four different unidirectional GANs as baseline networks and apply our connection network for establishing the bidirectional GANs. Those of baseline networks are DCGAN [Radford et al., 2015], LSGAN [Mao et al., 2017], DFM [Warde-Farley and Bengio, 2017] and WGAN-GP [Gulrajani et al., 2017]. We use a postfix “BD-” to indicate the baseline with our modification, thus BD-DCGAN, BD-LSGAN, BD-DFM, and BD-WGAN-GP are the variants of our bidirectional GANs. We intend to choose those four baseline networks because all four models are significantly different in terms of loss functions or network architectures. For example, DCGAN, LSGAN, and WGAN-GP exploit different metrics while DFM adds a denoising autoencoder to the discriminator for the robustness. Evaluating with a variety of unidirectional GANs, we aim to show the extendibility of the proposed algorithm. For fair evaluation, the architecture of all baseline networks (e.g., the number of layers, filter size, using batch normalization) borrows that of DCGAN. The connection network is composed of only two fully connected layers: 1024 full connected layer (FC) – Batch normalization (BN) – leaky rectified linear unit (Leaky ReLU) – 1024 FC – BN – Leaky ReLU – dimension of FC.
Although any type of datasets used for unidirectional GAN training can be subject to our algorithm, we choose CelebA [Liu et al., 2015] for the evaluation because human observers are more sensitive to judge the quality of human faces than other dataset [Strohminger et al., 2016].
4.1 Qualitative evaluation
Fig 2. visualizes the input images (odd columns) and their reconstructed images (even columns) from VAE, ALI, and ours (BD-DCGAN, BD-LSGAN, BD-DFM, and BD-WGAN-GP). Because VAE is optimized with a pixel reconstruction loss, the generated images are blurry and lose the detail structures. Interestingly, the minor attributes (e.g., mustache and baldness) that appear less frequently in the training dataset are rarely recovered in the reconstructed image. ALI generates sharp images compared to VAE. However, they are not effective to restore the important characteristics of the input image (e.g., identity), and occasionally generate completely different images from the input images.
In contrast, we observe that the reconstructed images with the variants of our bidirectional GANs exhibit consistently better visual quality than both VAE and ALI. Because training the connection network does not influence the training of the baseline GANs, our algorithm can achieve the best performance of unidirectional GANs. Furthermore, our results are superior to VAE and ALI in that we faithfully reconstruct the input images including various facial attributes, which VAE and ALI fail to handle. Based on these observations, we justify that the proposed algorithm accurately estimates the latent vector corresponding to the input image and retains the image quality.
To investigate the importance of the discriminative feature vector for the inference mapping, we compare ours with the naive encoder mapping (i.e., from the fake image to ). In this experiment, unlike existing bidirectional GANs, we train DCGAN and then train the encoder with the fixed unidirectional GAN similar to our algorithm. The encoder structure is identical to the discriminator except the last layer, which set to be the dimension of . Fig 3. compares the reconstructed images at three different iterations. The first and fourth columns show the input images, the second and fifth columns are the reconstructed image using our connection network, and the third and sixth columns are from the naive encoder. In all cases, the connection network provides higher quality, in terms of visual fidelity and the reconstruction accuracy. Based on this experiment, we confirm that our discriminative features are more effective to build the inference mapping than the images.
One of popular applications to exploit estimated latent vectors is an image editing. Previous study [Mikolov et al., 2013] reported that the simple arithmetic operation in latent space produces a rich linear structure in the representation space. Later, DCGAN conduct the latent vector arithmetic for facial attribute editing and produce various image effects for the first time. To estimate the latent vector of target image is, however, required a non-convex optimization, often intractable for real applications. The proposed method is attractive in that we achieve the accurate latent estimation in real time (i.e., forward propagation). Hence, our algorithm can be used to edit image attributes in real time. We demonstrate the results from the variants of our bidirectional GANs as shown in Fig 4. From this figure, the first to the third column are the result of adding the mean latent vector for blonde hair with the female face, and the fourth to sixth column are the result of adding the mean latent vector for glasses with the male face: followed by test image, reconstructed image, and vector arithmetic edited image. In all cases, our algorithm effectively synthesizes the attributes in various faces.
4.2 Quantitative evaluation
The inception score is widely used for quantitatively evaluating the visual quality of various GAN models because its score is highly correlated with the quality evaluation of the human annotators [Salimans et al., 2016]. By comparing the inception scores, we verify that unidirectional GANs (e.g., DCGAN, LSGAN, DFM, WGAN-GP) are superior to bidirectional GANs (e.g., ALI/BiGAN, AGE) in terms of visual quality. In Table 1, we compare the inception scores from various GANs Ulyanov et al. , Bang and Shim . Furthermore, we evaluate the image quality of our bidirectional GANs and find that the proposed algorithm achieves the identical image quality to the baseline algorithms, unidirectional GANs. From this experiment, we confirm that adding the connection network does not introduce any quality degradation for the image generation.
To quantitatively evaluate the reconstruction performance, we exploit two metrics; one is the peak signal-to-noise ratio (PSNR) that measures the pixel difference, and the other is the structural similarity index (SSIM) that measures on the structural difference. Although they are not an ideal metric to measure the similarity between two images, they can provide a reasonable quantitative reference. In this experimental evaluation, ALI and VAE are selected for representing existing bidirectional generative models, and then compared with four variants of our algorithms. For fair evaluation, we use a total of 1k test images for the experiment and then report their mean and variance values both in terms of PSNR and SSIM as demonstrated in Table 2. From this comparison, we confirm that four variants of our algorithm consistently outperforms existing bidirectional generative models. From this comparison, we can conclude that our algorithm is effective to accurately reconstruct the input image.
In this study, we propose a new framework that establishes bidirectional mapping of existing unidirectional GANs without sacrificing their image quality. For that, we introduce the connection network that transfers the discriminative feature to the latent vector. Training the connection network possesses two advantages. First, our framework neither collect training dataset nor suffer from the lack of training dataset. Instead, we simply sample latent vectors from (e.g., a Gaussian random noise), and then utilize the resultant discriminative features and their latent vectors for training the connection network. Although training pairs of the real data is not available in this learning stage, we expect that our training process can effectively cover the pairs from real data. It is because the number of our training pairs can approach the infinity; the pairs from real data can be approximated as a subset of our training pairs. As a result, we can establish the bidirectional mapping between the real data and its latent vectors successfully. Secondly, our training is completely separated from the unidirectional GAN training. Hence, we can preserve the original quality achieved by unidirectional GANs; existing bidirectional GANs generally lose the visual quality for accomplishing the bidirectional mapping. Owing to the two advantages, our framework can be easily extendable to any existing unidirectional GANs. We expect that our new framework for learning an inference mapping can be utilized for resolving the mode collapse as suggested in MDGAN Che et al.  and VEEGAN Srivastava et al. .
- Goodfellow et al.  Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in neural information processing systems, pages 2672–2680, 2014.
- Mikolov et al.  Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pages 3111–3119, 2013.
- Radford et al.  Alec Radford, Luke Metz, and Soumith Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434, 2015.
- Berthelot et al.  David Berthelot, Tom Schumm, and Luke Metz. Began: Boundary equilibrium generative adversarial networks. arXiv preprint arXiv:1703.10717, 2017.
- Kingma and Welling  Diederik P Kingma and Max Welling. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
- Makhzani et al.  Alireza Makhzani, Jonathon Shlens, Navdeep Jaitly, Ian Goodfellow, and Brendan Frey. Adversarial autoencoders. arXiv preprint arXiv:1511.05644, 2015.
- Baldi  Pierre Baldi. Autoencoders, unsupervised learning, and deep architectures. In Proceedings of ICML Workshop on Unsupervised and Transfer Learning, pages 37–49, 2012.
- Wainwright et al.  Martin J Wainwright, Michael I Jordan, et al. Graphical models, exponential families, and variational inference. Foundations and Trends® in Machine Learning, 1(1–2):1–305, 2008.
- Larsen et al.  Anders Boesen Lindbo Larsen, Søren Kaae Sønderby, Hugo Larochelle, and Ole Winther. Autoencoding beyond pixels using a learned similarity metric. arXiv preprint arXiv:1512.09300, 2015.
- Dumoulin et al.  Vincent Dumoulin, Ishmael Belghazi, Ben Poole, Olivier Mastropietro, Alex Lamb, Martin Arjovsky, and Aaron Courville. Adversarially learned inference. arXiv preprint arXiv:1606.00704, 2016.
- Donahue et al.  Jeff Donahue, Philipp Krähenbühl, and Trevor Darrell. Adversarial feature learning. arXiv preprint arXiv:1605.09782, 2016.
- Mao et al.  Xudong Mao, Qing Li, Haoran Xie, Raymond YK Lau, Zhen Wang, and Stephen Paul Smolley. Least squares generative adversarial networks. In 2017 IEEE International Conference on Computer Vision (ICCV), pages 2813–2821. IEEE, 2017.
- Gulrajani et al.  Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron C Courville. Improved training of wasserstein gans. In Advances in Neural Information Processing Systems, pages 5769–5779, 2017.
- Warde-Farley and Bengio  David Warde-Farley and Yoshua Bengio. Improving generative adversarial networks with denoising feature matching. In International Conference on Learning Representations, 2017.
- Che et al.  Tong Che, Yanran Li, Athul Paul Jacob, Yoshua Bengio, and Wenjie Li. Mode regularized generative adversarial networks. arXiv preprint arXiv:1612.02136, 2016.
- Srivastava et al.  Akash Srivastava, Lazar Valkoz, Chris Russell, Michael U Gutmann, and Charles Sutton. Veegan: Reducing mode collapse in gans using implicit variational learning. In Advances in Neural Information Processing Systems, pages 3310–3320, 2017.
- Liu et al.  Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. Deep learning face attributes in the wild. In Proceedings of the IEEE International Conference on Computer Vision, pages 3730–3738, 2015.
- Strohminger et al.  Nina Strohminger, Kurt Gray, Vladimir Chituc, Joseph Heffner, Chelsea Schein, and Titus Brooks Heagins. The mr2: A multi-racial, mega-resolution database of facial stimuli. Behavior research methods, 48(3):1197–1204, 2016.
- Salimans et al.  Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. Improved techniques for training gans. In Advances in Neural Information Processing Systems, pages 2234–2242, 2016.
- Ulyanov et al.  Dmitry Ulyanov, Andrea Vedaldi, and Victor Lempitsky. It takes (only) two: Adversarial generator-encoder networks. arXiv preprint arXiv:1704.02304, 2017.
- Bang and Shim  Duhyeon Bang and Hyunjung Shim. Improved training of generative adversarial networks using representative features. arXiv preprint arXiv:1801.09195, 2018.