Spectral Regularization for Combating Mode Collapse in GANs
Abstract
Despite excellent progress in recent years, mode collapse remains a major unsolved problem in generative adversarial networks (GANs). In this paper, we present spectral regularization for GANs (SRGANs), a new and robust method for combating the mode collapse problem in GANs. Theoretical analysis shows that the optimal solution to the discriminator has a strong relationship to the spectral distributions of the weight matrix. Therefore, we monitor the spectral distribution in the discriminator of spectral normalized GANs (SNGANs), and discover a phenomenon which we refer to as spectral collapse, where a large number of singular values of the weight matrices drop dramatically when mode collapse occurs. We show that there are strong evidence linking mode collapse to spectral collapse; and based on this link, we set out to tackle spectral collapse as a surrogate of mode collapse. We have developed a spectral regularization method where we compensate the spectral distributions of the weight matrices to prevent them from collapsing, which in turn successfully prevents mode collapse in GANs. We provide theoretical explanations for why SRGANs are more stable and can provide better performances than SNGANs. We also present extensive experimental results and analysis to show that SRGANs not only always outperform SNGANs but also always succeed in combating mode collapse where SNGANs fail. The code is available at https://github.com/maxliu112/SRGANsSpectralRegularizationGANs
1 Introduction
Generative Adversarial Networks (GANs) [5] are one of the most significant developments in machine learning research of the past decade. Since their first introduction, GANs have attracted intensive interest in the machine learning community not only for their ability to learn highly structured probability distributions but also for their theoretically implications [5, 13, 2, 17]. Essentially, GANs are constructed around two functions [3, 9]: the generator , which maps a sample z to the data distribution, and the discriminator , which is trained to distinguish real samples of a dataset from fake samples produced by the generator. With the goal of reducing the difference between the distributions of generated and real samples, a GAN training algorithm trains and in tandem.
GAN training is dynamic and sensitive to nearly every aspect of its setup, from optimization parameters to model architecture [1]. Training instability, or mode collapse, is one of the major obstacles in developing applications. Despite excellent progresses in recent years [6, 12, 10, 15, 7], the mode collapse problem still persists. For example, one of the most impressive works to emerge recently is BigGANs [1], which is the largest published GAN system based on the state of the art Spectral Normalization (SNGAN)[10]. However, BigGANs can still suffer from the training instability problem, especially when the batch size is scaled up. Although implementing training stabilization measures such as employing zerocentred gradient penalty term [1] in the loss metric of the discriminator to prevent spectral noise can improve stability, this can cause severe degradation in performance, resulting in a 45% reduction in Inception Score.
In this paper, we present Spectral Regularization, a robust method for combating the mode collapse problem in GANs. Theoretically, we analyze the optimal solution to a linear discriminator function constrained by 1Lipschitz continuity, and find the optimal solution is taken when all singular values of weight matrix are 1. Even though, in the implementation of GAN models, is nonlinear, we reason that the spectral distributions in may also have a strong relation to its performance. Through comprehensive analysis of spectral distributions in a large number of GAN models trained with the state of the art SNGAN algorithm, we discover that when mode collapse occurs to a model, spectral distributions of in also collapse, where is spectral normalized weight matrix. Specifically, we observe that when a model performs well and no mode collapse occurs, there are a large number of singular values of in very close to 1, and that when mode collapse occurs to a model, singular values of in will drop dramatically. We refer to the phenomenon where a large number of singular values drop significantly as spectral collapse.
In all GAN models of various sizes and trained with a variety of parameter settings on datasets extensively used in the literature, we observe that mode collapse and spectral collapse always go side by side. This fact leads us to reason that mode collapse in SNGANs is caused by spectral collapse in weight matrices. Based on such insight into spectral distributions of , we propose a new and robust method called spectral regularization to prevent GANs from mode collapse. In addition to normalizing the weight matrices, spectral regularization imposes constraints on weight matrices by compensating their spectral distributions to avoid spectral collapse. Theoretical analysis shows that spectral regularization is better than spectral normalization at preventing weight matrix from concentrating into one particular direction. We show that SNGANs are a special case of spectral regularization, and in a series of extensive experiments we demonstrate that spectral regularization not only provides superior performances to spectral normalization but also can always avoid mode collapse in cases where spectral normalization failed.
Our contributions can be summarized as follows:
(1) Through theoretical analysis and extensive experimental observations, we provide an insight into the likely causes of mode collapse in a state of the art GAN normalization technique, spectral normalization (SNGANs). We introduce the concept of spectral collapse and provide strong evidence to link spectral collapse with mode collapse in SNGANs.
(2) Based on above insight, we have developed a new robust regularization method, Spectral Regularization, where we compensate the spectral distributions of the weight matrices in to prevent spectral collapse, thus preventing mode collapse in GANs. Extensive experimental results show that spectral regularization not only can always prevent mode collapse but also can consistently provide improved performances over SNGANs.
2 Analysis of Mode Collapse in SNGANs
2.1 A Brief Summary of SNGANs
For easy discussion, we first briefly recap the essential ideas of the spectral normalization technique for training GANs [10]. As far we are aware, this is currently one of the best methods in the literature and has been successfully used to construct large systems such as BigGANs [1] . For convenience, we largely follow the notation convention of [10]. Considering a simple discriminator of a neural network of the following form:
(1) 
where is the learning parameters set, , , and is an elementwise nonlinear activation function. We omit the bias term of each layer for simplicity. The final output of the discriminator is given by
(2) 
where is an activation function corresponding to the divergence of a distance measure of users’ choice.
The standard formulation of GANs is given by [10, 13]:
(3) 
where min and max of G and D are taken over the set of the generator and discriminator functions respectively. The conventional form of V(G, D) is given by [10], where is the data distribution and is the model (generator) distribution.
To guarantee Lipschitz continuity, spectral normalization [10] controls the Lipschitz constant of the discriminator function by literally constraining the spectral norm of each layer:
(4) 
where (W) is the spectral norm of the weight matrix W in the discriminator network, which is equivalent to the largest singular value of W.
The authors of SNGANs [10] and those of BigGANs [1] have demonstrated the superiority of spectral normalization over other normalization or regularization techniques, e.g., gradient penalty [6], weight normalization[15] and orthonormal regularization [4]. However, as a state of the art GAN model, BigGANs (based on spectral normalization) can still suffer from mode collapse. Therefore, mode collapse remains an unsolved open problem, seeking better and more robust solution is very important for advancing GANs.
2.2 Theoretical Analysis
In order to unearth the likely causes of mode collapse, we start by analyzing the optimal solution to 1Lipschitz constrained discriminator.
To be specific, Proposition 1 in [6] has proven that the optimal solution to 1Lipschitz discriminator function has gradient norm 1 almost everywhere. Assuming the discriminator is a linear function, we find that the optimal solution is obtained only when all the singular values are 1. This can be verified by Corollary 1 (see proof in Appendix).
Corollary 1. Let and be two distributions in , a compact metric space. A linear and 1Lipschitz constrained function , is the optimal solution of . Then all the singular values of the weight matrix are 1.
We can see that, for a linear , the spectral distribution is strongly related to the performance of . For discriminators in GANs, is nonlinear. However, we reason that their spectral distributions may also have a strong relation to the performance of discriminator. As a result, we can monitor the spectral distribution to investigate the mode collapse problem.
2.3 Mode Collapse vs Spectral Collapse
In order to find the link between mode collapse and spectral distributions, we have conducted a series of experiments for unconditional image generation on CIFAR10 [16] and STL10 [8] datasets. Our implementation is based on the SNGANs architecture of [10], which uses the hinge loss as the discriminator objective and is given by:
(5) 
The optimization settings follow literature [10, 11]. Previous authors have shown that increasing batch size or decreasing discriminator capacity could potentially lead to mode collapse [1]. We therefore conduct experiments for various combinations of batch and channel sizes as listed in Table 1. We follow the practices in the literature of using Inception Score (IS) [14] and Fréchet Inception Distance (FID) [8] as approximate measures of sample quality, and results are shown in Table 2 where we also identify all settings where mode collapse has occurred to SNGANs. Through monitoring Inception Scores, Fréchet Inception Distance and synthetic images during training, mode collapse is observed in 10 settings including , , and . In other 16 settings, mode collapse has not happened.
Mode collapse is a persistent problem in GAN training and is also a major issue in SNGANs as has been shown in BigGANs[1] and in Table 2. Here, we monitor the entire spectral distributions of SNGANs, i.e., all singular values of in the discriminator network during training.
The discriminator network in our implementation uses the same architecture as that in the original SNGANs[10] and has 10 convolutional layers, please see Appendix for the setting details. In order to discover the likely causes of mode collapse, we plot the spectral distributions of every layer (except skip connection layers) of the discriminator for all 26 settings. In the following, we present some typical examples and readers are referred to the Appendix for all other plots.
Setting  Batch  CH  Dataset  Setting  Batch  CH  Dataset 

16  128  CIFAR10  8  32  CIFAR10  
32  128  CIFAR10  16  32  CIFAR10  
64  128  CIFAR10  32  32  CIFAR10  
128  128  CIFAR10  64  32  CIFAR10  
256  128  CIFAR10  128  32  CIFAR10  
512  128  CIFAR10  128  256  CIFAR10  
1024  128  CIFAR10  256  256  CIFAR10  
8  64  CIFAR10  512  256  CIFAR10  
16  64  CIFAR10  16  128  STL10  
32  64  CIFAR10  64  128  STL10  
64  64  CIFAR10  256  128  STL10  
128  64  CIFAR10  256  64  STL10  
256  64  CIFAR10  256  32  STL10 
Figure 1 shows the spectral distributions of of 5 settings where mode collapse does not happen. Figure 3 shows the spectral distributions of of all 10 settings where mode collapse has occurred. Through analyzing the spectral distribution plots in Figure 1 and Figure 3, we notice a very interesting pattern. In the cases where no mode collapse happens, the shapes of the spectral distribution curves do not change significantly with the number of training iteration. On the other hand, for those settings where mode collapse has occurred, the shapes of the spectral distribution curves change significantly as training progresses. In particular, a large number of singular values become very small when training passes a certain number of iterations. This is as if the curves have ”collapsed”, and we refer to this phenomenon as spectral collapse.
The phenomenon of spectral collapse is also observed across different settings. Figure 3 plots the spectral distributions of the 5 groups of experimental settings in Table 1. It is seen that in groups and , the spectral distributions across different settings are very similar and no spectral collapse is observed. Very interestingly, no mode collapse is observed either. In group , the spectral distributions of and have collapsed, not surprisingly, mode collapse also happens to these 3 settings. In group , the spectral distributions of all settings have collapsed, i.e., most singular values are very small (except for the first one which is forced to be 1 by spectral normalization). Again as expected, mode collapse happens to all settings in this group. In group , it is seen that the two settings and have suffered from spectral collapse. Again, mode collapse is observed for these two settings.
In order to understand what has happened when spectral collapse occurs, Figure 4 shows how a typical spectral distribution relates to Inception Score and Fréchet Inception Distance during training. It is seen that up to 19k iterations both IS and FID are showing good performances, and the corresponding spectral distribution has a large number of large singular values. At 20k iterations, IS and FID performances start to drop, correspondingly, the spectral distribution starts to fall. At 21k iterations, the IS and FID performances have dropped significantly and mode collapse has started, and very importantly, the spectral distribution has dropped dramatically  starting to collapse.
The association of mode collapse with spectral collapse is observed for all the layers and on all settings (readers are referred to the Appendix for more examples). We therefore believe that mode collapse and spectral collapse happen at the same time, and spectral collapse is the likely cause of mode collapse. In the following section, we will introduce spectral regularization to prevent spectral collapse thus avoiding mode collapse.
3 Spectral Regularization
We have now established that spectral collapse is closely linked to mode collapse in SNGANs. In this section, we introduce spectral regularization, a technique for preventing spectral collapse. We show that preventing spectral collapse can indeed solve the mode collapse problem, thus demonstrating that spectral collapse is the cause of mode collapse rather than a mere symptom.
Performing singular value decomposition, the weight matrix can be expressed as:
(6) 
where both U and V are orthogonal matrix, the columns of U, , are called left singular vectors of W, the columns of V, , are called right singular vectors of W, and can be expressed as:
(7) 
where represents the spectral distribution of .
When mode collapse occurs, spectral distributions concentrate on the first singular value, and the rest singular values drop dramatically (spectral collapse). To avoid spectral collapse, we first apply to compensate , where is given by , and is a hyperparameter (). Spectral regularization turns into as follows: . Correspondingly, turns to : , where is given by:
(8) 
Finally, we apply spectral normalization to guarantee Lipschitz continuity, and obtain our spectral regularized :
(9) 
Clearly, spectral normalization is a special case of spectral regularization (when ).

IS  FID  MC  SC 

IS  FID  MC  SC  
SN  SR  SN  SR  SN  SR  SN  SR  
8.15.09  8.35.09  22.31.28  24.67.28  4.21.18  4.93.20  80.001.12  66.052.12  SN  SN  
8.38.07  8.45.10  25.96.42  22.00.17  4.05.15  4.78.23  79.69.21  59.25.43  SN  SN  
8.39.15  8.65.12  21.15.15  20.31.18  4.29.08  4.70.15  78.39.17  62.10.24  SN  SN  
8.61.12  8.72.08  21.01.23  19.98.19  4.30.14  5.00.14  85.151.20  56.11.54  SN  SN  
8.45.14  8.48.03  20.87.25  19.87.21  4.87.14  5.30.07  71.10.89  54.39.41  SN  SN  
8.34.09  8.53.04  21.85.14  20.13.12  8.14.06  8.92.18  24.43.41  18.95.23  
8.31.21  8.52.16  21.68.35  20.34.13  8.29.12  8.83.14  22.54.29  19.56.11  
6.67.05  7.42.06  45.19.89  35.78.11  8.33.09  8.36.12  22.58.16  21.82.29  
7.34.06  7.59.08  31.73.49  29.42.22  8.63.15  8.69.16  44.24.56  43.19.33  
7.18.03  7.48.09  33.76.35  28.60.25  8.98.20  9.14.18  42.40.56  39.89.89  
6.96.11  7.52.11  36.65.29  28.40.36  SN  SN  9.10.13  9.11.17  40.11.89  40.08.29  
7.10.14  7.13.05  35.99.48  31.41.56  SN  SN  7.38.14  7.67.06  74.501.52  69.20.83  SN  SN  
6.85.08  7.58.03  35.88.42  27.68.23  SN  SN  4.04.11  4.38.07  98.501.34  89.171.23  SN  SN 
3.1 Gradient Analysis of Spectral Regularization
We perform gradient analysis to show that spectral regularization provides a more effective way over spectral normalization in preventing from concentrating into one particular direction during training and thus avoiding spectral collapse.
From equation (9), we can write the gradient of with respect to as:
(10) 
where represents the th entry of corresponding matrix, is the matrix whose th entry is 1 and zero everywhere else.
We would like to comment on the implication of equation (10). The first two terms, , are the gradient of spectral normalization [10], this is very easy to see from equation (9). As explained in [10], the second term can be regarded as being able to prevent the columns space of from concentrating into one particular direction in the course of training. In other words, spectral normalization prevents the transformation of each layer from becoming sensitive only in one direction. However, as we have seen (e.g. Figure 3), despite performing spectral normalization, the spectral distributions of can still concentrate on the first singular value thus causing spectral collapse. This shows the limited ability of spectral normalization in preventing from spectral collapse.
In addition to the first two terms of spectral normalization, spectral regularization introduces the third and fourth terms in equation (10). It can be seen that the third term enhances the effect of the second term, through which is much less likely to concentrate into one particular direction. Furthermore, the fourth term can be seen as the regularization term, encouraging to move along all directions pointed to by , for , each weighted by the adaptive regularization coefficient . This encourages to make full use of the directions pointed to by , thus preventing from being concentrated on only 1 direction, which in turn stabilizes the training process.
From above analysis, it is clear that as compared to spectral normalization, spectral regularization of equation (10) encourages of the discriminator to move in a variety of directions thus preventing it from concentrating only on one direction, which in turn prevents spectral collapse. We will show in the experimental section that performing spectral regularization can indeed prevent mode collapse where spectral normalization has failed.
4 Experiments
For all settings listed in Table 1, we have conducted experiments using SNGANs and the newly introduced spectral regularization algorithm (we use the abbreviation: SRGANs for the spectral regularized GANs). All procedures and settings for SNGANs and SRGANs are identical, except that for SRGANs the last discriminator update implements spectral regularization (equation 9) and SNGANs implement spectral normalization (equation 4). The default value of the hyperparameter in SRGANs is empirically set as , where is the number of singular values in the corresponding weight matrix. Readers are referred to Appendix for the details of the network architecture settings.
The Inception Score (IS) and Fréchet Inception Distance (FID ) performances are shown in Table 2. Please note that in the cases where mode collapse have happened, IS and FID are the best results before mode collapse. It is clearly seen that in all cases, SRGANs outperforms SNGANs. In particular, for the setting of , SRGAN has improved IS by 9.5% and FID by 22.4%. On average, SRGANs have improved the IS by 8.9% and FID by 18.9% over SNGANs. Very importantly, in all 10 settings where mode collapse has occurred to SNGANs, none has happened to SRGANs. In fact, we have not yet observed mode collapses in an extensive set of experiments. We therefore have demonstrated that the new SRGANs is superior to SNGANs in both quality and stability.
Figure 5 shows an example of how the spectral distributions of the weights of the discriminator are affected by SNGANs and SRGANs. It is seen that SNGANs normalize the largest singular value. However, in some cases, it cannot stop other singular values to drop significantly thus causing spectral collapse which in turn results in mode collapse. In contrast, SRGANs ensures that the first singular values are 1 in all cases, thus ensuring that spectral collapse would not happen hence preventing mode collapse. Similar effects are observed in all layers and for all settings. This illustrates that SRGANs can indeed prevent spectral collapse which in turn avoid mode collapse.
A combination of large batch and small channel sizes can easily cause SNGANs to suffer from mode collapse. An example is in our experiment. Figure 7 (a) and Figure 7 (b) show the changes of IS and FID measures of this setting during training. It is seen that after about 20k iterations, the performance of SNGAN has started to drop and eventually lead to mode collapse. In contrast, the performance of SRGAN is improved steadily as training progresses. Importantly, no mode collapse has occurred. Figure 7 (c) and Figure 7 (d) show some example images generated by SNGAN and SRGAN of this setting. It is clearly seen from Figure 7 (d) that mode collapse has indeed occurred to SNGAN.
When channel size is small, mode collapse will happen to SNGAN regardless of batch size as shown in our group experiments. Figure 7 shows the training history of SNGAN and SRGAN for the setting . It is seen that for SNGAN, mode collapse has happened almost at the start of the training process and performance continues to deteriorate until eventually lead to mode collapse. In contrast, the performance of SRGAN improves steadily and eventually converges (no mode collapse). Examples of generated images by the two training methods for this setting are also shown in the Figure. It is again clearly seen that mode collapse has indeed happened to SNGAN while the images generated by SRGAN are of better quality and more varieties.
In Section 2, we show that mode collapse is strongly linked to spectral collapse. By introducing spectral regularization to adjust the singular values of the weight matrices to prevent them from dropping to small values thus preventing spectral collapse, we have successfully introduced a new method for combating mode collapse. From the results presented here in this section, we have shown that regularizing the spectral distributions of the weight matrices to ensure a large number of their singular values not drop to small values can indeed prevent spectral collapse, which in turn has successfully prevented mode collapse.
4.1 The Hyperparameter in SRGANs
SRGAN has a single hyperparameter and its value will affect performances. In the experiments above, in SRGANs is set to , where is the number of singular values. Clearly, when , SRGAN is the same as SNGAN, therefore SNGAN is a special case of SRGAN. To investigate the effect of , we gradually increase , and observe its influence on model performance. In Figure 8, we show the Inception Scores and Fréchet Inception Distances for different values of . For experiment groups and , increasing from 0.25 to 0.5, the performances are improved. However, continuously increasing from 0.5 to , the performances deteriorate. For experiments in group , performances increase steadily with .
To understand why affects performances in this way, we feed the discriminator function with the generated data and real data from both the training and testing sets, and then record the statistics of in equation (2) and the discriminator objective in equation (5). For explanation convenience, some typical results are illustrated here and more data can be found in the Appendix.
The probability distributions of for the generated data and that for the training data for the setting of and different values are shown in Figure 11 (a) and Figure 11 (b), respectively. Here represents training set, and represents generated set. The probability distributions of is shown in Figure 11 (c).
When increasing from 0.25 to , the distributions of have a tendency of moving to the right, and at the same time the distributions of have a tendency of moving to the left. This means that the discriminator can better discriminate between the real and generated samples. This is also verified by the distributions of as can be clearly seen in Figure 11 (c).
To investigate discriminator’s performance on the testing set, we show the probability distributions of and for the setting in Figure 11, where represents test set. It is seen that for and , the two distributions are more similar to each other than that of . In the case of , the discriminator behaves significantly differently between the training data and testing data, this means that overfitting has occurred and results in a drop in performances. In summary, Figure 11 and Figure 11 explain the performance drop for setting in experiment groups and .
Furthermore, we monitor the statistics of for the settings in group to explain why affects the behaviors of SRGANs as in Figure 8. The probability distributions of for the setting are shown in Figure 11. We can see that for all the values, the probability distributions of the discriminator output for the training and testing data agree well with each other, indicating no overfitting has occurred.
Although there is no systematic method for determining the best value for different settings, our experiences is that setting seems to work well. In a series of extensive experiments we conducted, setting , SRGANs always outperform SNGANs and very importantly, we have not yet observed mode collapse.
5 Conclusions
In this paper, we monitor spectral distributions of the discriminator’s weight matrices in SNGANs. We discover that when mode collapse occurs to a SNGAN, a large number of its weight matrices singular values will drop to very small values, and we introduce the concept of spectral collapse to describe this phenomenon. We have provided strong evidence to link mode collapse with spectral collapse. Based on such link, we have successfully developed a spectral regularization technique for training GANs. We show that by compensating the spectral distributions of the weight matrices, we can successfully prevent spectral collapse which in turn can successfully prevent mode collapse. In a series of extensive experiments, we have successfully demonstrated that preventing spectral collapse can not only avoid mode collapse but also can improve GANs performances.
References
 [1] Brock Andrew, Jeff Donahue, and Karen Simonyan. Large scale gan training for high fidelity natural image synthesis. arXiv preprint arXiv, 2018.
 [2] Martin Arjovsky, Soumith Chintala, and Léon Bottou. Wasserstein gan. arXiv preprint arXiv, page 1701.07875, 2017.
 [3] David Berthelot, Thomas Schumm, and Luke Metz. Began: Boundary equilibrium generative adversarial networks. arXiv preprint arXiv, page 1703.10717, 2017.
 [4] Andrew Brock, Theodore Lim, and James M. Ritchie. Neural photo editing with introspective adversarial networks. arXiv preprint arXiv, page 1609.07093, 2016.
 [5] Ian Goodfellow, Jean PougetAbadie, and Mehdi Mirza. Generative adversarial nets. Advances in neural information processing systems, pages 2672–2680, 2014.
 [6] Ishaan Gulrajani, Faruk Ahmed, and Martin Arjovsky. Improved training of wasserstein gans. Advances in Neural Information Processing Systems, pages 5769–5779, 2017.
 [7] Juha Heinonen. Lectures on lipschitz analysis. University of Jyvaskyla, 2005.
 [8] Martin Heusel, Hubert Ramsauer, and Thomas Unterthiner. Gans trained by a two timescale update rule converge to a nash equilibrium. arXiv preprint arXiv, page 1706.08500, 2017.
 [9] Xudong Mao, Qing Li, and Haoran Xie. Least squares generative adversarial networks. 2017 IEEE International Conference on Computer Vision (ICCV), pages 2813–2821, 2017.
 [10] Takeru Miyato, Toshiki Kataoka, and Masanori Koyama. Spectral normalization for generative adversarial networks. arXiv preprint arXiv, page 1802.05957, 2018.
 [11] Takeru Miyato and Masanori Koyama. cgans with projection discriminator. arXiv preprint arXiv, page 1808.05637, 2018.
 [12] Guojun Qi. Losssensitive generative adversarial networks on lipschitz densities. arXiv preprint arXiv, page 1701.06264, 2017.
 [13] Alec Radford, Luke Metz, and Soumith Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv, page 1511.06434, 2015.
 [14] Tim Salimans, Ian Goodfellow, and Wojciech Zaremba. Improved techniques for training gans. Advances in Neural Information Processing Systems, pages 2234–2242, 2016.
 [15] Tim Salimans and Durk Kingma. Weight normalization: A simple reparameterization to accelerate training of deep neural networks. Advances in Neural Information Processing Systems, pages 901–909, 2016.
 [16] Antonio Torralba, Rob Fergus, and William T. Freeman. 80 million tiny images: A large data set for nonparametric object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(11):901–909, 2008.
 [17] Jiqing Wu, Zhiwu Huang, and Janine Thoma. Energyrelaxed wassertein gans (energywgan): Towards more stable and high resolution image generation. arXiv preprint arXiv, page 1712.01026, 2017.
Appendix A Proof of Corollary 1
Corollary 1. Let and be two distributions in , a compact metric space. A linear and 1Lipschitz constrained function , is the optimal solution of . Then all the singular values of the weight matrix are 1.
Proof: Proposition 1 in Gulrajani et al., Improved Training of Wasserstein GANs, Advances in Neural Information Processing Systems, 57695779, 2017. has proven that the optimal solution to 1Lipschitz discriminator function has gradient norm 1 almost everywhere. In other words, is obtained at the upper bound of 1Lipschitz constraint: .
Because is a linear function: . The 1Lipschitz constraint for can be expressed as:
(11) 
Equation 11 is equivalent to:
(12) 
and,
(13) 
where columns of , are eigenvectors of , and diagonal entries of diagonal matrix are eigenvalues of .
Taking , then
(14) 
where is the th eigenvalue, and is the th element of .
Because is symmetric, , then
(15) 
Finally, is equivalent to . We can see that the upper bound of 1Lipschitz constraint can be obtained only when all eigenvalues of are 1. In other words, all the singular values of are 1.
Appendix B Architecture and Optimization Settings
In this paper, we employ SNGAN architecture, which is illustrated in Figure 12. The weight in the convolutional layer is in the format [, , , ], where is the output channel, represents the input channel, and are kernel sizes. Particularly, there are 10 convolutional layers () in discriminator network, and CH in Figure 12(b) corresponds to channel size of discriminator function in main text, where extensive experiments are conducted with different settings of CH. All the experiments are conducted based on the following architecture. Image generation on STL10 shares the same architecture with that on CIFAR10. Thus, images in STL10 are compressed to 32 32 pixels, identical to the resolution of images in CIFAR10. The purpose is to to evaluate how different data affect mode collapse and spectral distribution, regardless of the effect of architecture.
The optimization settings follow SNGANs. To be specific, the learning rate is taken as 0.0002, the number of updates of the discriminator per one update of the generator is 5, the batch size is taken as 64, and Adam optimizer is used as the optimization with the first and second order momentum parameters as 0 and 0.9, respectively.
Appendix C Spectral Distributions
In Figure 13 Figure 17, we show the spectral distributions for different settings in . We can see that spectral collapse and mode collapse always go side by side. In Figure 18 Figure 22, we show the spectral distributions of each layer except and . Because and act as the role of skip connection, and the intense correlation between mode collapse and spectral distortion is not observed in and .
Appendix D Statistics of and Discriminator Objective
In the main text, we primarily show the statistics of and discriminator objective for setting and . In Figure 23 Figure 24, we show the mean and variance of and . To be specific, we feed the discriminator function with generated data and data in the training set, and obtain the output of the discriminator objective, then we calculate its mean and variance, finally we show their variation with in Figure 23. To investigate the performance of discriminator on test set, we monitor and , where , represent the training and test set, respectively. Then, we calculate the mean and variance, and show the variation with in Figure 24.
In Figure 23, we can see that discriminator objective has a decreasing tendency with the increase of . As we can see in Figure 24, and diverge when is excessively large. Thus, excessively increasing potentially leads to overfitting, especially when is taken as . As we can see in Figure 24 (d) Figure 24 (f), and agree well, indicating that no overfitting is observed in group .