When Relation Networks meet GANs: Relation GANs with Triplet Loss
Abstract
Though recent research has achieved remarkable progress in generating realistic images with generative adversarial networks (GANs), the lack of training stability is still a lingering concern of most GANs, especially on highresolution inputs and complex datasets. Since the randomly generated distribution can hardly overlap with the real distribution, training GANs often suffers from the gradient vanishing problem. A number of approaches have been proposed to address this issue by constraining the discriminator’s capabilities using empirical techniques, like weight clipping, gradient penalty, spectral normalization etc. In this paper, we provide a more principled approach as an alternative solution to this issue. Instead of training the discriminator to distinguish real and fake input samples, we investigate the relationship between paired samples by training the discriminator to separate paired samples from the same distribution and those from different distributions. To this end, we explore a relation network architecture for the discriminator and design a triplet loss which performs better generalization and stability. Extensive experiments on benchmark datasets show that the proposed relation discriminator and new loss can provide significant improvement on variable vision tasks including unconditional and conditional image generation and image translation. Our source codes will be available on the website: https://github.com/JosephineRabbit/RelationGAN
1 Introduction
Since first proposed in [8], generative adversarial networks (GANs) have witnessed a rapid development and found numerous applications in many computer vision tasks, such as image generation [8, 13, 44], person reidentification [2], image superresolution [31] etc. It also has been extended to natural language processing [38], video sequence synthesis [6], and speech synthesis [30] recently.
Though tremendous success has been achieved in many fields, training GANs is still a very tricky process and suffers from many issues, including the instability between the generator and the discriminator as well as the extremely subtle sensitivity to network architecture and hyperparameters. It has been proved that most of these issues are due to the fact that the support of both target distribution and generated distribution are often of low dimension regarding to the base space, and therefore misaligned at most of the time, causing discriminator to collapse to a function that hardly provides gradients to the generator.
To remedy this issue, recent works proposed to leverage the Integral Probability Metric (IPM), such as Gradient Penalty [9] and Spectral Normalization [28]. In IPMbased GANs, the discriminator is constrained to a specific class of function so that it does not grow too quickly and thus alleviates vanishing gradients.
However, the existing IPM methods also have their limits. For instance, the hyperparameter tuning of gradient penalty is mostly empirical, while the spectral normalization imposes constrains on every convlayers which hinders the learning capacity of discriminators.
In [13], the authors argue that nonIPMbased GANs are missing a relativistic discriminator, which IPMbased GANs already possess. The relativistic discriminator is necessary to make the training process analogous to divergence minimization and produce sensible predictions based on the prior knowledge that half of the samples in the minibatch are fake. Although they have shown the power of relativistic discriminator, the potential of comparing the relation between real and fake distribution still remains to be explored.
In this paper, we explicitly study the effect of relation comparison in GANs by training the discriminator to determine whether the input paired samples are drawn from the same distribution (either real or fake). A relation network is present, acting as the discriminator. A new triplet loss is also designed for training the GANs. In this way, the beforementioned problem of the disjointed support could be alleviated by projecting and merging the low dimension data distribution into a high dimension feature space.
Mathematically, we prove our new triplet loss is a divergence and could achieve the Nash equilibrium leading to convergence of the generated data distribution to the real distribution. In addition, we analyze the oscillatory behavior that GANs exhibit for the DiracGAN and we demonstrate the proposed Relation GAN is locally convergent even with no regularized methods.
Extensive experiments are conducted on conditional and unconditional image generation and image translation tasks. The promising performance demonstrates the proposed relation gan has great potential in various applications of GANs.
In summary, the contributions of this paper are two folds.

We propose a new training strategy for GANs to better leverage the relation between samples. Instead of separating real samples from generated ones, the discriminator is trained to determine whether a paired samples are from the same distribution.

We propose a relation network architecture as the discriminator and a triplet loss for training GANs. We show both theoretically and empirically that the relation network together with the triplet loss give rise to generated density which can exactly match that of real data.
Extensive experiments on 2D grid [27], Stacked MNIST [17], CelebA [21], LSUN [42], CelebAHQ [22] data sets confirm our proposed method performs favourably against stateofthearts such as relativistic GAN [13], WGANGP [9], Least Squares GAN (LSGAN) [24] and vanilla GAN [8].
2 Related Work
The vanilla GAN [8] minimizes the JS divergence of two distributions, leading to the gradient vanishing problem when the two distributions are disjoint. Recent works try to address this issue by designing new objective functions [24, 32, 37, 1] or more sophisticated network architectures [14, 45, 5, 33]. Others investigate the regularization and/or normalization to constrain the ability of discriminator [28, 9, 16]. Recently, a new method [13] is proposed to explore a relativistic discriminator. In the following, we will review recent works using different objective functions and a special case–relativistic GANs, which are closely related to our approaches.
2.1 Different Objective Functions in GANs
Generally, there are two kind of loss functions in GANs: the minimax GAN and the nonsaturating (NS) GAN. In the former the discriminator minimizes the negative loglikelihood for the binary classification task. In the latter the generator maximizes the probability of generated samples being real. The nonsaturating loss as it is known to outperform the minimax variant empirically. Among them, loss sensitive GAN [32] tries to solve the problem of gradient vanishing by focusing on training samples with low authenticity. WGAN [1] proposes the Wasserstein distance to replace the JS divergence, which can measure their distance even though the two distributions are disjoint. In addition, [1] also proposes to add noise to both real and generated samples to further alleviate the impact of disjoint distributions. [9] improves WGAN by replacing the weight clipping constraints with a gradient penalty, which enforces the Lipschitz constraint on the discriminator by punishing the norm of the gradient. DRAGAN [16] combines the two parts of WGAN and LSGAN, and only improves the loss function to a certain extent. The stability of loss training is controlled by constantly updating the coefficient of the latter term.
2.2 Relativistic GANs
Instead of training discriminators to predict the absolute probabilities of the input samples being real, the relativistic GAN [13] proposes to use a relativistic discriminator, which estimates the probability of the given real sample being more realistic than a randomly sampled fake sample. Although bears a similar spirit, our method differs from [13] in that we adopt a relation network as the discriminator to estimate the relation score of a paired input. In comparison, the discriminator in [13] treats input samples separately and relies on a ranking loss (e.g., hinge loss) to explore their relation. The idea of merging the features and comparing the relation between samples from two distribution has not been explored in the literature of GANs. In addition, our method proposes a new triplet loss to leverage the power of paired relation comparison, allowing more stability and better diversity for GANs without applying any IPM methods.
3 The Relation GAN Framework
3.1 Relation Net Architecture
In traditional GANs, a discriminator is trained to distinguish real samples from fake ones and a generator is trained to confuse the discriminator by generating realistic samples. Consider a real data distribution , and the data distribution produced by the generator . Rather than training the discriminator on real and fake data independently, we propose to train a discriminator which predicts a relation score for a paired input, indicating whether the paired samples are from the same distribution (either or ).
Inspired by the success of relation net architecture in other computer vision areas [39], our discriminator consists of two modules, including an embedding module and a relation module as shown in Figure 1. For a pair of input samples, the embedding module firstly maps each sample into a high dimensional feature space. Their corresponding features are then merged and fed into the relation module to produce the relation score for the input pair. For ease of description, we name paired inputs containing both real and fake samples as asymmetric pairs, and those containing samples from the same distribution (either real or fake) as symmetric pairs. The training process is then formulated as a minmax game (See Section 3.2), where the discriminator aims to maximize the relation scores of asymmetric sample pairs and minimize those of symmetric ones. Meanwhile, the generator is trained to confuse the discriminator by minimizing the relation scores of asymmetric sample pairs containing real and generated samples.
3.2 The MinMax Game
The minmax game in training GANs is conducted by optimizing the losses of and iteratively. In the noIPM GANs, the generalized losses of and can be presented as follows:
(1) 
and
(2) 
where and are scalartoscalar functions, is the distribution of real data, and denotes the generated data distribution.
In our Relation GAN, the formulation of the losses functions of and are as follows:
(3) 
and
(4) 
where and are also scalar to scalar function.
The goal of relation discriminator is to learn a loss function parameterized by which separates symmetric and asymmetric sample pairs by a desired margin. Then the generator can be trained to minimize this margin by generating realistic samples.
Inspired by the success of triplet loss [7], we formulate the similar loss function in our Relation GAN as follows:
(5) 
and
where and are samples from the real data distribution, is sample from the generated data distribution and is sample from the data generated by the generator in the last step of optimization. We use a distance metric to replace the constant ‘margin’ in the original triplet loss. This variable constraining leads to a smaller difference of relation scores when the distance between the two compared samples are smaller, which is more flexible than the original fixed margin. Our experiments also shows the superiority of our new triplet loss with margin.
3.3 A Variant Loss
Since the training batch size is limited, the sampled distribution of each batch may deviates from the real data distribution. For an input batch of paired samples, the loss function in (3.2) can be written as follows:
(6) 
where . Our triplet loss is designed to reduce the relation scores of symmetric sample pairs and increase those of asymmetric ones.
However, when the real sample distribution is fairly uniform with small variance, the original loss is rigorous and prone to be disturbed by outliers in one batch. For these cases, we design a variant of our new triplet loss as follows:
(7) 
where represents the index of samples in a batch. The variant loss is more relaxed and not easily disturbed by the extreme samples in the same batch. It performs better on evenly distributed data sets.
Thus, we suggest to employ the variant triplet loss on uniform distribution data, e.g., datasets with only single class data. Our experiments results on the dataset of single class such as, CelebA and LSUN confirm it.
4 Theory Proof and Analysis
As discussed in the introduction, the optimal discriminator of most GANs is a divergence. In this section, we firstly prove that the proposed discriminator based on the relation net also has such property, and then show the distributional consistency under our Lipschitz continuous assumption.
4.1 A New Divergence
A divergence is a function of two variables , satisfies the following definition:
Definition 1 If is function of two variables , satisfies the following properties:
1.
2.
Then is a divergence between and .
Assumption 1 In the training process, when not reach the optimal , ought to be more realistic than , and ought to give bigger relation score to the paired input than . ought to be more realistic than also means, is bigger than
Under this assumption, we show the loss function of our relation discriminator is also a divergence in Supplementary 1.
4.2 Distributional Consistency
We use to denote the parameterized function discriminator and to denote the parameterized function of generator. Based on [32], we use the definition of Lipschitz assumption of data density as follows:
Definition 2 For any two samples and , the loss function is Lipschitz continuous with respect to a distance metric if
with a bounded Lipschitz constant , i.e.,
Assumption 2 The data density is supported in a compact set and it is Lipschitz continuous wrt with a bounded constant which is satisfied with Definition 2. Then we show the existence of Nash equilibrium such that both the function and the density of generated samples are Lipschitz. Same as the [32], we have both (, ) and (, ) are convex in and in . Then, according to the Sionâs theorem [36], with and being optimized, there exists a Nash equilibrium (, ) We also have the following lemma.
Under Assumption 2, there exists a Nash equilibrium (, ) such that both and are Lipschitz.Then we could prove that when reaching the Nash equilibrium, the density distribution of the samples generated by will converge to the real data distribution , which is the lemma 1 as follows:
Lemma 1 Under Assumption 2, for a Nash equilibrium in Lemma 1, we have
Thus, converges to . The proof of this lemma is given in the Supplementary 2.
(a) Vanilla GAN  (b) WGAN  (c) WGANGP  (d) GANQP  (e) Relation GAN 
4.3 The Convergence
In the literature, GANs are often treated as dynamic systems to study their training convergence [25], [26], [29], [11]. This idea can be dated back to the Dirac GAN [25], which describes a simple yet prototypical counterexample for understanding whether the GAN training is locally nor globally convergent. To further analyze the convergence rate of training the proposed Relation GAN, we also adopt the Dirac GAN theory. However, [25] only discusses the situation where the data distributions are 1D. We extend this theory into the 2D case to gain better understanding.
Definition 3 The DiracGAN consists of a (univariate) generator distribution and a linear discriminator , where denotes the parameter of the generator, is a 2D vector, and represents the parameter of the discriminator. The real data distribution is a Diracdistribution concentrated at .
Suppose the real sample point is a vector , and the fake sample is being reorganized, which also represents a parameter of the generator. The discriminator uses the simplest linear model, i.e., , which also represents the parameters of the discriminator. Dirac GAN takes into account that in such a minimalist model, whether a false sample eventually converges to a true sample, in other words, whether a finally converges to . Specifically, in Relation GAN, our Dirac Discriminator could be simplified as: , where and denotes the parameter of the embedding module and relation module respectively.
Based on the dynamic analysis for GANs in Supplementary 3, we have the numerical solution of the GANs’ dynamic equations with a initial point as the fig 2 shows. In [25], the author find that most unregulared GANs are not locally convergent. In our 2D Dirac GANs, the numerical solutions of the WGAN [1], WGANGP [9], GANQP [37], vanilla GAN [8] also perform oscillating near the real sample or hard to converge to the real sample point, while our Relation GAN success to converge. It indicates that our GAN has a good local convergence.
5 Experiments
We first evaluate the proposed Relation GAN on the 2D synthetic dataset and the Stacked MNIST dataset to demonstrate the diversity of generated data and the stability of generator. We then perform the image generation tasks with our method to show its superiority in synthesizing natural images. Finally, ablation study is conducted to verify the effects of the feature merging mechanism in relation nets and the proposed triplet loss.
(a) Vanilla GAN  (b) LSGAN  (c) WGANGP  (d)Relativistic GAN  (e) Relation GAN 
5.1 The Diversity of Generated Data
2D Datasets We compare the effect of our relation discriminator on the 2D 8Gaussian distribution, 2D 25Gaussian distribution and 2D swissroll distribution. The experimental settings follow [41]. The results generated by our method and four popular methods under the same setting are shown in Figure 3. Compared with the other methods, ours can better fit these 2D distributions.
Stacked MNIST For Stacked MNIST [17] experiments, we use the setting and code of [41]. Each of the three channels in each sample is classified by a pretrained MNIST classifier, and the resulting three digits determine which of the 1000 modes the sample belongs to. We measure the number of modes captured with the pretrained classifier. We choose Adam [15] optimizer for all experiments. Our results are shown in Table 5.1. We find that our Relation GAN could achieve best mode coverage, reaching all 1,000 modes.
Loss  Modes 

LSGAN  98510 
WGANGP  6437 
Vanilla GAN  92318 
Relativastic GAN  82858 
Ours  10000 
5.2 Unconditional Image Generation
Datasets We provide comparison on four datasets, namely CIFAR10 [4], CelebA [21], LSUNBEDROOM [42] and CelebAHQ [22]. The LSUNBEDROOM dataset [42] contains 3M images which are randomly partitioned into a test set of around 30k images and a training set containing the rest. We use version of CelebAHQ with 30k images. We only compare our method with Relativistic GAN and WGANGP on CelebAHQ due to limited computation resources.
Settings For CIFAR10, we use the Resnet [10] architecture proposed in [41](with spectral normalization layers removed). For CelebA, LSUN and CelebAHQ, we used a DCGAN architecture as in [28]. We apply Adam optimizer on all experiments as Table 5.2 shows. We used 1 discriminator updates per generator update. The batch size used was 64. Other details of our experiments settings are provided in Supplementary.
Dataset
Iterations
CIFAR10
0.0002
0.0001
0.9
0.999
600k
CelebA
0.0002
0.0001
0.9
0.999
400k
LSUN
0.0001
0.0001
0
0.9
400k
CelebAHQ
0.0001
0.0001
0
0.9
250k
Evaluation To compare the sample quality of different models, we consider three different scores: IS [35], FID [11] and KID [3] which are based on the pretrained Inception network [40] on ImageNet [34].
Results and Analysis Some random generated samples on 3 data sets are shown in Figure LABEL:fig:Generated. More generated images and evaluation scores are provided in Supplementary 6. From Table 3 we could find RelatioGAN is also highly competitive on single class data sets i.e. CelebA, LSUN, while RelationGAN achieves the best performance on CIFAR10. As we discussed in Sec.3.3, the variant loss of is more relaxed and suitable for evenly distributed data sets while the loss of in eq. (3.3) is more strict and performs better on multiclass or harder data sets (also performs best on Stacked MNIST).
CIFAR10  CelebA  
FID  KID  IS  FID  KID  IS  
Vanilla GAN  26.460.12  1.880.061  6.730.081  34.430.15  3.010.044  2.680.020 
LSGAN  \colorgreen14.90.061  \colorgreen 1.310.056  \colorred7.740.12  \colorgreen19.630.11  \colorgreen1.840.045  2.50.021 
WGANGP  63.560.14  8.010.068  3.560.038  66.060.27  9.060.081  2.600.029 
Relativistic  23.960.15  1.880.061  0.0610.081  26.710.10  2.080.050  \colorgreen3.020.024 
RelationGAN  \colorred13.520.060  \colorred1.260.052  \colorgreen7.740.18  25.370.14  2.070.044  2.650.029 
RelationGAN  47.960.30  8.880.072  3.320.026  \colorred11.990.064  \colorred1.100.038  \colorred3.170.036 
LSUN  CelebAHQ  
FID  KID  IS  FID  KID  IS  
Vanilla GAN  38.170.28  6.610.076  \colorred4.570.010  –  –  – 
LSGAN  150.610.33  21.750.11  3.570.043  –  –  – 
WGANGP  \colorgreen14.930.16  \colorgreen1.450.042  3.770.098  68.50.19  7.710.065  \colorred2.31 0.017 
Relativistic  40.840.23  2.970.045  4.080.049  32.240.21  \colorgreen2.27 0.056  1.960.038 
RelationGAN  70.240.37  5.890.078  \colorgreen4.40.056  \colorgreen27.870.17  \colorred2.210.047  2.130.0052 
RelationGAN  \colorred12.590.11  \colorred1.370.038  3.700.081  \colorred26.170.12  2.620.043  \colorgreen2.150.030 
5.3 Conditional Image Generation
We compare the MSGAN [23] which is one of the best conditional gan model on conditonal CIFAR10 datasets. The experiment is applied by simply replace the MSloss in [23] with the relation loss. Table 4 represents the results of FID.
MSGAN  RelationGAN  
FID  28.73  24.88 
5.4 Image Translation
In addition to image generation task, GANs also gains promising progress in image translation task. It has been shown a great success in ranges of image translation tasks, including style transfer, image enhance, image super resolution and image segmentation. We conduct three relative experiments on image style transfer and image super resolution, respectively.
Image Style Transfer For image style transfer task, we adopt the CycleGAN as our baseline model to translate Monetâs painting into photograph. FID score is applied to evaluate the quality of generated images. Table 5 shows the comparison of fid scores of generated images. The lower fid represents smaller perceptual difference between target domain images and generated images. We find the both relation loss and relation loss performs better than the oigianl adversarial loss in cyclegan and the reltion loss performs best.
Image Super Resolution For Image Super Resolution task, we employ SRGAN [18] with the relastivistc loss which is the latest proposed loss for gans as our baseline. We denote our baseline as SRGAN. The train and val datasets are sampled from VOC2012. Train dataset has 16700 images and Val dataset has 425 images. We compare the psrn and ssim on three popular SR datasets: Set5 [43], Set14 [19] and Urban100 [12].
FID(MP)  FID(PM)  

CycleGAN  34.00  2.48 
RelationGAN  33.60  2.26 
RelationGAN  33.71  2.21 
Table 6 lists the psnr and ssim of different approaches on five datasets. We can observe that the fid scores of the proposed algorithm perform better than the original method on photopainting datasets.
Set5  Set14  Urban100  

psnr  ssim  psnr  ssim  psnr  ssim  
SRGAN  28.40  0.82  25.37  0.73  23.36  0.71 
RelationGAN  28.59  0.83  25.52  0.73  23.47  0.72 
5.5 Ablation Study
We conduct the ablation study on image generation datasets. We first compare our triplet loss with the siamese loss [7], whose results are shown in Table 7. The formulation of siamese loss function is shown in Supplementary 4. Second, we take a closer look on the impact of our embedding module and relation module. The “” in the Table 8 represents different architectures of discriminator, where the embedding module contains resblock and the relation module contains resblock. The “(0+3)” represents the samples are contacted together after first convlayer and then put into the relation module (RM) which contains 3 resblock. The “no EM” represents the samples in which the paired input are packed in the beginning of the discriminator as [20]. All experiments are conducted on CIFAR10.
Results and Analysis From Table 7, we could find the results of the proposed triplet loss is much better than Siamese loss. The “” represents model collapse in training process. The results in Table 8 shows the bigger size of EM could enhance the performance which also demonstrates the effectiveness of our embedding strategy.
CIFAR10  CelebA  

Triplet  13.42  11.9 
Siamese  –  107.3 
FID(CIFAR10)  

no EM  38.9 
37.37  
28.89  
28.80  
13.52 
6 Conclusion
In this paper we propose the Relation GANs. A relation network architecture is designed and used as the discriminator, which is trained to determine whether a paired input samples are from the same distribution or not. The generator is jointly trained with the discriminator to confuse its decision using a triplet loss.
Mathematically, we prove that the optimal discriminator based on the relation network is a divergence, indicating the distance of generated data distribution and the real data distribution becomes progressively smaller during the training process. We also prove the generated data distribution will converge to the real data distribution when getting to the Nash equilibrium. In addition, we analysis our method and several other GANs in dynamic system. We demonstrate our GAN has excellent convergence by analyzing the dynamic system of the Dirac GANs.
The results of experiments on simple 2D distribution data and Stacked MNIST verify the effectiveness of Relation GAN, especially in addressing the mode collapse problem. Our Relation GAN not only achieves stateoftheart performance on unconditional and conditional image generation task with the basic architecture and training settings, but also achieves promising results in image translation tasks compared with other gan losses.
References
 (2017) Wasserstein generative adversarial networks. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 611 August 2017, pp. 214–223. Cited by: §2.1, §2, §4.3.
 (2018) Advances in neural information processing systems 31: annual conference on neural information processing systems 2018, neurips 2018, 38 december 2018, montréal, canada. Cited by: §1.
 (2018) Demystifying MMD gans. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30  May 3, 2018, Conference Track Proceedings, Cited by: §5.2.
 (2010) Learning midlevel features for recognition. In The TwentyThird IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2010, San Francisco, CA, USA, 1318 June 2010, pp. 2559–2566. Cited by: §5.2.
 (2018) Large scale GAN training for high fidelity natural image synthesis. CoRR abs/1809.11096. Cited by: §2.
 (2018) Deep video generation, prediction and completion of human action sequences. In Computer Vision  ECCV 2018  15th European Conference, Munich, Germany, September 814, 2018, Proceedings, Part II, pp. 374–390. Cited by: §1.
 (2016) Learning local image descriptors with deep siamese and triplet convolutional networks by minimizing global loss functions. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 2730, 2016, pp. 5385–5394. Cited by: §3.2, §5.5.
 (2014) Generative adversarial nets. In Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 813 2014, Montreal, Quebec, Canada, pp. 2672–2680. Cited by: §1, §1, §2, §3.2, §4.3.
 (2017) Improved training of wasserstein gans. In Conference and Workshop on Neural Information Processing Systems, pp. 5769–5779. Cited by: §1, §1, §2.1, §2, §4.3.
 (2016) Deep residual learning for image recognition. See ?, pp. 770–778. External Links: Link, Document Cited by: §5.2.
 (2017) GANs trained by a two timescale update rule converge to a local nash equilibrium. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 49 December 2017, Long Beach, CA, USA, pp. 6629–6640. Cited by: §4.3, §5.2.
 (2015) Single image superresolution from transformed selfexemplars. See ?, pp. 5197–5206. External Links: Link, Document Cited by: §5.4.
 (2018) The relativistic discriminator: a key element missing from standard GAN. CoRR abs/1807.00734. Cited by: §1, §1, §1, §2.2, §2.
 (2018) Progressive growing of gans for improved quality, stability, and variation. See ?, External Links: Link Cited by: §2.
 (2014) Adam: A method for stochastic optimization. CoRR abs/1412.6980. Cited by: §5.1.
 (2018) On convergence and stability of gans. CoRR. Cited by: §2.1, §2.
 (1989) Handwritten digit recognition with a backpropagation network. In Advances in Neural Information Processing Systems 2, [NIPS Conference, Denver, Colorado, USA, November 2730, 1989], pp. 396–404. Cited by: §1, §5.1.
 (2017) Photorealistic single image superresolution using a generative adversarial network. See ?, pp. 105–114. External Links: Link, Document Cited by: §5.4.
 (2001) New edgedirected interpolation. IEEE Trans. Image Processing 10 (10), pp. 1521–1527. External Links: Link, Document Cited by: §5.4.
 (2018) PacGAN: the power of two samples in generative adversarial networks. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, 38 December 2018, Montréal, Canada., pp. 1505–1514. Cited by: §5.5.
 (2015) Deep learning face attributes in the wild. In 2015 IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile, December 713, 2015, pp. 3730–3738. Cited by: §1, §5.2.
 (2018) Are gans created equal? a largescale study. In Advances in Neural Information Processing Systems 31, pp. 700–709. Cited by: §1, §5.2.
 (2019) Mode seeking generative adversarial networks for diverse image synthesis. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 1620, 2019, pp. 1429–1437. Cited by: §5.3.
 (2017) Least squares generative adversarial networks. In IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 2229, 2017, pp. 2813–2821. Cited by: §1, §2.
 (2018) Which training methods for gans do actually converge?. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 1015, 2018, pp. 3478–3487. Cited by: §4.3, §4.3.
 (2017) The numerics of gans. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 49 December 2017, Long Beach, CA, USA, pp. 1823–1833. Cited by: §4.3.
 (2017) Unrolled generative adversarial networks. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 2426, 2017, Conference Track Proceedings, Cited by: §1.
 (2018) Spectral normalization for generative adversarial networks. CoRR abs/1802.05957. Cited by: §1, §2, §5.2.
 (2017) Gradient descent GAN optimization is locally stable. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 49 December 2017, Long Beach, CA, USA, pp. 5591–5600. Cited by: §4.3.
 (2017) SEGAN: speech enhancement generative adversarial network. CoRR abs/1703.09452. Cited by: §1.
 (2017) Photorealistic single image superresolution using a generative adversarial network. Cited by: §1.
 (2017) Losssensitive generative adversarial networks on lipschitz densities. CoRR abs/1701.06264. Cited by: §2.1, §2, §4.2, §4.2.
 (2016) Unsupervised representation learning with deep convolutional generative adversarial networks. In 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 24, 2016, Conference Track Proceedings, Cited by: §2.
 (2015) ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV) 115 (3), pp. 211–252. Cited by: §5.2.
 (2016) Improved techniques for training gans. In Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 510, 2016, Barcelona, Spain, pp. 2226–2234. Cited by: §5.2.
 (1958) On general minimax theorems.. Pacific J. Math. 8 (1), pp. 171–176. Cited by: §4.2.
 (201811) GANqp: a novel gan framework without gradient vanishing and lipschitz constraint. pp. . Cited by: §2, §4.3.
 (2017) Adversarial generation of natural language. In Proceedings of the 2nd Workshop on Representation Learning for NLP, Rep4NLP@ACL 2017, Vancouver, Canada, August 3, 2017, pp. 241–251. Cited by: §1.
 (2018) Learning to compare: relation network for fewshot learning. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 1822, 2018, pp. 1199–1208. Cited by: §3.1.
 (2016) Rethinking the inception architecture for computer vision. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 2730, 2016, pp. 2818–2826. Cited by: §5.2.
 (2019) Improving generalization and stability of generative adversarial networks. In International Conference on Learning Representations, External Links: Link Cited by: §5.1, §5.1, §5.2.
 (2015) LSUN: construction of a largescale image dataset using deep learning with humans in the loop. CoRR abs/1506.03365. Cited by: §1, §5.2.
 (2010) On single image scaleup using sparserepresentations. See ?, pp. 711–730. External Links: Link, Document Cited by: §5.4.
 (2018) Selfattention generative adversarial networks. CoRR abs/1805.08318. Cited by: §1.
 (2018) Selfattention generative adversarial networks. CoRR abs/1805.08318. Cited by: §2.