Spectral Regularization for Combating Mode Collapse in GANs

Spectral Regularization for Combating Mode Collapse in GANs

Kanglin Liu, Wenming Tang, Fei Zhou, Guoping Qiu
Shenzhen University, Shenzhen, China
Guangdong Key Laboratory of Intelligent Information Processing, Shenzhen, China
Shenzhen Institute of Artificial Intelligence and Robotics for Society, Shenzhen, China
University of Nottingham, Nottingham, United Kingdom
max.liu.426@gmail.com, guoping.qiu@nottingham.ac.uk
Abstract

Despite excellent progress in recent years, mode collapse remains a major unsolved problem in generative adversarial networks (GANs). In this paper, we present spectral regularization for GANs (SR-GANs), a new and robust method for combating the mode collapse problem in GANs. Theoretical analysis shows that the optimal solution to the discriminator has a strong relationship to the spectral distributions of the weight matrix. Therefore, we monitor the spectral distribution in the discriminator of spectral normalized GANs (SN-GANs), and discover a phenomenon which we refer to as spectral collapse, where a large number of singular values of the weight matrices drop dramatically when mode collapse occurs. We show that there are strong evidence linking mode collapse to spectral collapse; and based on this link, we set out to tackle spectral collapse as a surrogate of mode collapse. We have developed a spectral regularization method where we compensate the spectral distributions of the weight matrices to prevent them from collapsing, which in turn successfully prevents mode collapse in GANs. We provide theoretical explanations for why SR-GANs are more stable and can provide better performances than SN-GANs. We also present extensive experimental results and analysis to show that SR-GANs not only always outperform SN-GANs but also always succeed in combating mode collapse where SN-GANs fail. The code is available at https://github.com/max-liu-112/SRGANs-Spectral-Regularization-GANs-

1 Introduction

Generative Adversarial Networks (GANs) [5] are one of the most significant developments in machine learning research of the past decade. Since their first introduction, GANs have attracted intensive interest in the machine learning community not only for their ability to learn highly structured probability distributions but also for their theoretically implications [5, 13, 2, 17]. Essentially, GANs are constructed around two functions [3, 9]: the generator , which maps a sample z to the data distribution, and the discriminator , which is trained to distinguish real samples of a dataset from fake samples produced by the generator. With the goal of reducing the difference between the distributions of generated and real samples, a GAN training algorithm trains and in tandem.

GAN training is dynamic and sensitive to nearly every aspect of its setup, from optimization parameters to model architecture [1]. Training instability, or mode collapse, is one of the major obstacles in developing applications. Despite excellent progresses in recent years [6, 12, 10, 15, 7], the mode collapse problem still persists. For example, one of the most impressive works to emerge recently is BigGANs [1], which is the largest published GAN system based on the state of the art Spectral Normalization (SN-GAN)[10]. However, BigGANs can still suffer from the training instability problem, especially when the batch size is scaled up. Although implementing training stabilization measures such as employing zero-centred gradient penalty term [1] in the loss metric of the discriminator to prevent spectral noise can improve stability, this can cause severe degradation in performance, resulting in a 45% reduction in Inception Score.

In this paper, we present Spectral Regularization, a robust method for combating the mode collapse problem in GANs. Theoretically, we analyze the optimal solution to a linear discriminator function constrained by 1-Lipschitz continuity, and find the optimal solution is taken when all singular values of weight matrix are 1. Even though, in the implementation of GAN models, is non-linear, we reason that the spectral distributions in may also have a strong relation to its performance. Through comprehensive analysis of spectral distributions in a large number of GAN models trained with the state of the art SN-GAN algorithm, we discover that when mode collapse occurs to a model, spectral distributions of in also collapse, where is spectral normalized weight matrix. Specifically, we observe that when a model performs well and no mode collapse occurs, there are a large number of singular values of in very close to 1, and that when mode collapse occurs to a model, singular values of in will drop dramatically. We refer to the phenomenon where a large number of singular values drop significantly as spectral collapse.

In all GAN models of various sizes and trained with a variety of parameter settings on datasets extensively used in the literature, we observe that mode collapse and spectral collapse always go side by side. This fact leads us to reason that mode collapse in SN-GANs is caused by spectral collapse in weight matrices. Based on such insight into spectral distributions of , we propose a new and robust method called spectral regularization to prevent GANs from mode collapse. In addition to normalizing the weight matrices, spectral regularization imposes constraints on weight matrices by compensating their spectral distributions to avoid spectral collapse. Theoretical analysis shows that spectral regularization is better than spectral normalization at preventing weight matrix from concentrating into one particular direction. We show that SN-GANs are a special case of spectral regularization, and in a series of extensive experiments we demonstrate that spectral regularization not only provides superior performances to spectral normalization but also can always avoid mode collapse in cases where spectral normalization failed.

Our contributions can be summarized as follows:

(1) Through theoretical analysis and extensive experimental observations, we provide an insight into the likely causes of mode collapse in a state of the art GAN normalization technique, spectral normalization (SN-GANs). We introduce the concept of spectral collapse and provide strong evidence to link spectral collapse with mode collapse in SN-GANs.

(2) Based on above insight, we have developed a new robust regularization method, Spectral Regularization, where we compensate the spectral distributions of the weight matrices in to prevent spectral collapse, thus preventing mode collapse in GANs. Extensive experimental results show that spectral regularization not only can always prevent mode collapse but also can consistently provide improved performances over SN-GANs.

2 Analysis of Mode Collapse in SN-GANs

2.1 A Brief Summary of SN-GANs

For easy discussion, we first briefly recap the essential ideas of the spectral normalization technique for training GANs [10]. As far we are aware, this is currently one of the best methods in the literature and has been successfully used to construct large systems such as BigGANs [1] . For convenience, we largely follow the notation convention of [10]. Considering a simple discriminator of a neural network of the following form:

(1)

where is the learning parameters set, , , and is an element-wise non-linear activation function. We omit the bias term of each layer for simplicity. The final output of the discriminator is given by

(2)

where is an activation function corresponding to the divergence of a distance measure of users’ choice.

The standard formulation of GANs is given by [10, 13]:

(3)

where min and max of G and D are taken over the set of the generator and discriminator functions respectively. The conventional form of V(G, D) is given by [10], where is the data distribution and is the model (generator) distribution.

To guarantee Lipschitz continuity, spectral normalization [10] controls the Lipschitz constant of the discriminator function by literally constraining the spectral norm of each layer:

(4)

where (W) is the spectral norm of the weight matrix W in the discriminator network, which is equivalent to the largest singular value of W.

The authors of SN-GANs [10] and those of BigGANs [1] have demonstrated the superiority of spectral normalization over other normalization or regularization techniques, e.g., gradient penalty [6], weight normalization[15] and orthonormal regularization [4]. However, as a state of the art GAN model, BigGANs (based on spectral normalization) can still suffer from mode collapse. Therefore, mode collapse remains an unsolved open problem, seeking better and more robust solution is very important for advancing GANs.

2.2 Theoretical Analysis

In order to unearth the likely causes of mode collapse, we start by analyzing the optimal solution to 1-Lipschitz constrained discriminator.

To be specific, Proposition 1 in [6] has proven that the optimal solution to 1-Lipschitz discriminator function has gradient norm 1 almost everywhere. Assuming the discriminator is a linear function, we find that the optimal solution is obtained only when all the singular values are 1. This can be verified by Corollary 1 (see proof in Appendix).

Corollary 1. Let and be two distributions in , a compact metric space. A linear and 1-Lipschitz constrained function , is the optimal solution of . Then all the singular values of the weight matrix are 1.

We can see that, for a linear , the spectral distribution is strongly related to the performance of . For discriminators in GANs, is nonlinear. However, we reason that their spectral distributions may also have a strong relation to the performance of discriminator. As a result, we can monitor the spectral distribution to investigate the mode collapse problem.

2.3 Mode Collapse vs Spectral Collapse

In order to find the link between mode collapse and spectral distributions, we have conducted a series of experiments for unconditional image generation on CIFAR-10 [16] and STL-10 [8] datasets. Our implementation is based on the SN-GANs architecture of [10], which uses the hinge loss as the discriminator objective and is given by:

(5)

The optimization settings follow literature [10, 11]. Previous authors have shown that increasing batch size or decreasing discriminator capacity could potentially lead to mode collapse [1]. We therefore conduct experiments for various combinations of batch and channel sizes as listed in Table 1. We follow the practices in the literature of using Inception Score (IS) [14] and Fréchet Inception Distance (FID) [8] as approximate measures of sample quality, and results are shown in Table 2 where we also identify all settings where mode collapse has occurred to SN-GANs. Through monitoring Inception Scores, Fréchet Inception Distance and synthetic images during training, mode collapse is observed in 10 settings including , , and . In other 16 settings, mode collapse has not happened.

Mode collapse is a persistent problem in GAN training and is also a major issue in SN-GANs as has been shown in BigGANs[1] and in Table 2. Here, we monitor the entire spectral distributions of SN-GANs, i.e., all singular values of in the discriminator network during training.

The discriminator network in our implementation uses the same architecture as that in the original SN-GANs[10] and has 10 convolutional layers, please see Appendix for the setting details. In order to discover the likely causes of mode collapse, we plot the spectral distributions of every layer (except skip connection layers) of the discriminator for all 26 settings. In the following, we present some typical examples and readers are referred to the Appendix for all other plots.

Setting Batch CH Dataset Setting Batch CH Dataset
16 128 CIFAR-10 8 32 CIFAR-10
32 128 CIFAR-10 16 32 CIFAR-10
64 128 CIFAR-10 32 32 CIFAR-10
128 128 CIFAR-10 64 32 CIFAR-10
256 128 CIFAR-10 128 32 CIFAR-10
512 128 CIFAR-10 128 256 CIFAR-10
1024 128 CIFAR-10 256 256 CIFAR-10
8 64 CIFAR-10 512 256 CIFAR-10
16 64 CIFAR-10 16 128 STL-10
32 64 CIFAR-10 64 128 STL-10
64 64 CIFAR-10 256 128 STL-10
128 64 CIFAR-10 256 64 STL-10
256 64 CIFAR-10 256 32 STL-10
Table 1: Experiment settings. The experiments are divided into 5 groups and . Within each group, the models share exactly the same network architecture but differ in batch size. For groups , we vary the batch sizes inside each group to study how batch sizes relate to mode collapse, and we change the channel sizes between groups to investigate how discriminator capacity affects mode collapse. Group is experiments applied to a different data set. The purpose is to evaluate how different data affect mode collapse. Batch represents the batch size. CH is the channel size of the discriminator. The subscript of each group name annotates the batch and channel setting of that experiment, e.g., represents setting with a batch size and a CH size .
(a)
(b)
(c)
(d)
(e)
Figure 1: Spectral distributions in for Good GANs (no mode collapse) at different number of iterations. The curves represent the spectral distributions after iterations, iterations, …, and iterations.
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
(j)
(a) group
(b) group
(c) group
(d) group
(e) group
Figure 2: Spectral distributions in for settings where mode collapse occurs. The curves represent the spectral distributions after iterations, iterations, …, and iterations.
Figure 3: Spectral distributions (after iterations) in for different settings.
Figure 2: Spectral distributions in for settings where mode collapse occurs. The curves represent the spectral distributions after iterations, iterations, …, and iterations.

Figure 1 shows the spectral distributions of of 5 settings where mode collapse does not happen. Figure 3 shows the spectral distributions of of all 10 settings where mode collapse has occurred. Through analyzing the spectral distribution plots in Figure 1 and Figure 3, we notice a very interesting pattern. In the cases where no mode collapse happens, the shapes of the spectral distribution curves do not change significantly with the number of training iteration. On the other hand, for those settings where mode collapse has occurred, the shapes of the spectral distribution curves change significantly as training progresses. In particular, a large number of singular values become very small when training passes a certain number of iterations. This is as if the curves have ”collapsed”, and we refer to this phenomenon as spectral collapse.

The phenomenon of spectral collapse is also observed across different settings. Figure 3 plots the spectral distributions of the 5 groups of experimental settings in Table 1. It is seen that in groups and , the spectral distributions across different settings are very similar and no spectral collapse is observed. Very interestingly, no mode collapse is observed either. In group , the spectral distributions of and have collapsed, not surprisingly, mode collapse also happens to these 3 settings. In group , the spectral distributions of all settings have collapsed, i.e., most singular values are very small (except for the first one which is forced to be 1 by spectral normalization). Again as expected, mode collapse happens to all settings in this group. In group , it is seen that the two settings and have suffered from spectral collapse. Again, mode collapse is observed for these two settings.

(a) spectral distributions
(b) Inception Score
(c) Fréchet Inception Distance
Figure 4: An example showing how spectral distributions relate to Inception Score and Fréchet Inception Distance. Here the setting is and the spectral distributions correspond to those of .

In order to understand what has happened when spectral collapse occurs, Figure 4 shows how a typical spectral distribution relates to Inception Score and Fréchet Inception Distance during training. It is seen that up to 19k iterations both IS and FID are showing good performances, and the corresponding spectral distribution has a large number of large singular values. At 20k iterations, IS and FID performances start to drop, correspondingly, the spectral distribution starts to fall. At 21k iterations, the IS and FID performances have dropped significantly and mode collapse has started, and very importantly, the spectral distribution has dropped dramatically - starting to collapse.

The association of mode collapse with spectral collapse is observed for all the layers and on all settings (readers are referred to the Appendix for more examples). We therefore believe that mode collapse and spectral collapse happen at the same time, and spectral collapse is the likely cause of mode collapse. In the following section, we will introduce spectral regularization to prevent spectral collapse thus avoiding mode collapse.

3 Spectral Regularization

We have now established that spectral collapse is closely linked to mode collapse in SN-GANs. In this section, we introduce spectral regularization, a technique for preventing spectral collapse. We show that preventing spectral collapse can indeed solve the mode collapse problem, thus demonstrating that spectral collapse is the cause of mode collapse rather than a mere symptom.

Performing singular value decomposition, the weight matrix can be expressed as:

(6)

where both U and V are orthogonal matrix, the columns of U, , are called left singular vectors of W, the columns of V, , are called right singular vectors of W, and can be expressed as:

(7)

where represents the spectral distribution of .

When mode collapse occurs, spectral distributions concentrate on the first singular value, and the rest singular values drop dramatically (spectral collapse). To avoid spectral collapse, we first apply to compensate , where is given by , and is a hyperparameter (). Spectral regularization turns into as follows: . Correspondingly, turns to : , where is given by:

(8)

Finally, we apply spectral normalization to guarantee Lipschitz continuity, and obtain our spectral regularized :

(9)

Clearly, spectral normalization is a special case of spectral regularization (when ).

Experiment
Setting
IS FID MC SC
Experiment
Setting
IS FID MC SC
SN SR SN SR SN SR SN SR
8.15.09 8.35.09 22.31.28 24.67.28 4.21.18 4.93.20 80.001.12 66.052.12 SN SN
8.38.07 8.45.10 25.96.42 22.00.17 4.05.15 4.78.23 79.69.21 59.25.43 SN SN
8.39.15 8.65.12 21.15.15 20.31.18 4.29.08 4.70.15 78.39.17 62.10.24 SN SN
8.61.12 8.72.08 21.01.23 19.98.19 4.30.14 5.00.14 85.151.20 56.11.54 SN SN
8.45.14 8.48.03 20.87.25 19.87.21 4.87.14 5.30.07 71.10.89 54.39.41 SN SN
8.34.09 8.53.04 21.85.14 20.13.12 8.14.06 8.92.18 24.43.41 18.95.23
8.31.21 8.52.16 21.68.35 20.34.13 8.29.12 8.83.14 22.54.29 19.56.11
6.67.05 7.42.06 45.19.89 35.78.11 8.33.09 8.36.12 22.58.16 21.82.29
7.34.06 7.59.08 31.73.49 29.42.22 8.63.15 8.69.16 44.24.56 43.19.33
7.18.03 7.48.09 33.76.35 28.60.25 8.98.20 9.14.18 42.40.56 39.89.89
6.96.11 7.52.11 36.65.29 28.40.36 SN SN 9.10.13 9.11.17 40.11.89 40.08.29
7.10.14 7.13.05 35.99.48 31.41.56 SN SN 7.38.14 7.67.06 74.501.52 69.20.83 SN SN
6.85.08 7.58.03 35.88.42 27.68.23 SN SN 4.04.11 4.38.07 98.501.34 89.171.23 SN SN
Table 2: IS and FID results for different settings, where IS is Inception Score and FID is Fréchet Inception Distance. For IS, higher is better, while lower is better for FID. SN, SR represent Spectral normalization and Spectral Regularization, respectively. MC stands for mode collapse, and SC stands for spectral collapse, represents that no mode collapse or spectral collapse occurs. SN in the MC column or SC column represents that mode collapse or spectral collapse occurred to spectral normalization. Note that neither mode collapse nor spectral collapse happen to spectral regularization for all settings.
(a)
(b)
(c)
(d)
(e)
Figure 5: The effect of SN-GANs and SR-GANs algorithms on spectral distributions. The plots show the spectral distributions of the weight matrix in . Spectral collapse and mode collapse have happened to SN-GANs in (b), (c), and (e). In all cases, there is no spectral collapse and mode collapse in SR-GANs.

3.1 Gradient Analysis of Spectral Regularization

We perform gradient analysis to show that spectral regularization provides a more effective way over spectral normalization in preventing from concentrating into one particular direction during training and thus avoiding spectral collapse.

From equation (9), we can write the gradient of with respect to as:

(10)

where represents the -th entry of corresponding matrix, is the matrix whose -th entry is 1 and zero everywhere else.

We would like to comment on the implication of equation (10). The first two terms, , are the gradient of spectral normalization [10], this is very easy to see from equation (9). As explained in [10], the second term can be regarded as being able to prevent the columns space of from concentrating into one particular direction in the course of training. In other words, spectral normalization prevents the transformation of each layer from becoming sensitive only in one direction. However, as we have seen (e.g. Figure 3), despite performing spectral normalization, the spectral distributions of can still concentrate on the first singular value thus causing spectral collapse. This shows the limited ability of spectral normalization in preventing from spectral collapse.

In addition to the first two terms of spectral normalization, spectral regularization introduces the third and fourth terms in equation (10). It can be seen that the third term enhances the effect of the second term, through which is much less likely to concentrate into one particular direction. Furthermore, the fourth term can be seen as the regularization term, encouraging to move along all directions pointed to by , for , each weighted by the adaptive regularization coefficient . This encourages to make full use of the directions pointed to by , thus preventing from being concentrated on only 1 direction, which in turn stabilizes the training process.

From above analysis, it is clear that as compared to spectral normalization, spectral regularization of equation (10) encourages of the discriminator to move in a variety of directions thus preventing it from concentrating only on one direction, which in turn prevents spectral collapse. We will show in the experimental section that performing spectral regularization can indeed prevent mode collapse where spectral normalization has failed.

(a) Inception Score
(b) Fréchet Inception Distance
(c) synthetic images with SR
(d) synthetic images with SN
(a) Inception Score
(b) Fréchet Inception Distance
(c) synthetic images with SR
(d) synthetic images with SN
Figure 6: Inception Score, Fréchet Inception Distance and synthetic images of SN-GAN and SR-GAN for the setting
Figure 7: Inception Score, Fréchet Inception Distance and synthetic images of SN-GAN and SR-GAN for the setting
Figure 6: Inception Score, Fréchet Inception Distance and synthetic images of SN-GAN and SR-GAN for the setting

4 Experiments

For all settings listed in Table 1, we have conducted experiments using SN-GANs and the newly introduced spectral regularization algorithm (we use the abbreviation: SR-GANs for the spectral regularized GANs). All procedures and settings for SN-GANs and SR-GANs are identical, except that for SR-GANs the last discriminator update implements spectral regularization (equation 9) and SN-GANs implement spectral normalization (equation 4). The default value of the hyperparameter in SR-GANs is empirically set as , where is the number of singular values in the corresponding weight matrix. Readers are referred to Appendix for the details of the network architecture settings.

The Inception Score (IS) and Fréchet Inception Distance (FID ) performances are shown in Table 2. Please note that in the cases where mode collapse have happened, IS and FID are the best results before mode collapse. It is clearly seen that in all cases, SR-GANs outperforms SN-GANs. In particular, for the setting of , SR-GAN has improved IS by 9.5% and FID by 22.4%. On average, SR-GANs have improved the IS by 8.9% and FID by 18.9% over SN-GANs. Very importantly, in all 10 settings where mode collapse has occurred to SN-GANs, none has happened to SR-GANs. In fact, we have not yet observed mode collapses in an extensive set of experiments. We therefore have demonstrated that the new SR-GANs is superior to SN-GANs in both quality and stability.

Figure 5 shows an example of how the spectral distributions of the weights of the discriminator are affected by SN-GANs and SR-GANs. It is seen that SN-GANs normalize the largest singular value. However, in some cases, it cannot stop other singular values to drop significantly thus causing spectral collapse which in turn results in mode collapse. In contrast, SR-GANs ensures that the first singular values are 1 in all cases, thus ensuring that spectral collapse would not happen hence preventing mode collapse. Similar effects are observed in all layers and for all settings. This illustrates that SR-GANs can indeed prevent spectral collapse which in turn avoid mode collapse.

A combination of large batch and small channel sizes can easily cause SN-GANs to suffer from mode collapse. An example is in our experiment. Figure 7 (a) and Figure 7 (b) show the changes of IS and FID measures of this setting during training. It is seen that after about 20k iterations, the performance of SN-GAN has started to drop and eventually lead to mode collapse. In contrast, the performance of SR-GAN is improved steadily as training progresses. Importantly, no mode collapse has occurred. Figure 7 (c) and Figure 7 (d) show some example images generated by SN-GAN and SR-GAN of this setting. It is clearly seen from Figure 7 (d) that mode collapse has indeed occurred to SN-GAN.

When channel size is small, mode collapse will happen to SN-GAN regardless of batch size as shown in our group experiments. Figure 7 shows the training history of SN-GAN and SR-GAN for the setting . It is seen that for SN-GAN, mode collapse has happened almost at the start of the training process and performance continues to deteriorate until eventually lead to mode collapse. In contrast, the performance of SR-GAN improves steadily and eventually converges (no mode collapse). Examples of generated images by the two training methods for this setting are also shown in the Figure. It is again clearly seen that mode collapse has indeed happened to SN-GAN while the images generated by SR-GAN are of better quality and more varieties.

In Section 2, we show that mode collapse is strongly linked to spectral collapse. By introducing spectral regularization to adjust the singular values of the weight matrices to prevent them from dropping to small values thus preventing spectral collapse, we have successfully introduced a new method for combating mode collapse. From the results presented here in this section, we have shown that regularizing the spectral distributions of the weight matrices to ensure a large number of their singular values not drop to small values can indeed prevent spectral collapse, which in turn has successfully prevented mode collapse.

(a) Inception Score
(b) Fréchet Inception Distance
Figure 8: The effect of on model performance. represents the number of singular values in corresponding weight matrix.
Figure 10: Statistics of with setting .
(a)
(b)
(c)
(a) (a) =0.25
(b) (a) =0.25
(c) (c) =
(a) (a) =0.25
(b) (a) =0.25
(c) (c) =
Figure 9: Statistics of and .
Figure 10: Statistics of with setting .
Figure 11: Statistics of with setting .
Figure 9: Statistics of and .

4.1 The Hyperparameter in SR-GANs

SR-GAN has a single hyperparameter and its value will affect performances. In the experiments above, in SR-GANs is set to , where is the number of singular values. Clearly, when , SR-GAN is the same as SN-GAN, therefore SN-GAN is a special case of SR-GAN. To investigate the effect of , we gradually increase , and observe its influence on model performance. In Figure 8, we show the Inception Scores and Fréchet Inception Distances for different values of . For experiment groups and , increasing from 0.25 to 0.5, the performances are improved. However, continuously increasing from 0.5 to , the performances deteriorate. For experiments in group , performances increase steadily with .

To understand why affects performances in this way, we feed the discriminator function with the generated data and real data from both the training and testing sets, and then record the statistics of in equation (2) and the discriminator objective in equation (5). For explanation convenience, some typical results are illustrated here and more data can be found in the Appendix.

The probability distributions of for the generated data and that for the training data for the setting of and different values are shown in Figure 11 (a) and Figure 11 (b), respectively. Here represents training set, and represents generated set. The probability distributions of is shown in Figure 11 (c).

When increasing from 0.25 to , the distributions of have a tendency of moving to the right, and at the same time the distributions of have a tendency of moving to the left. This means that the discriminator can better discriminate between the real and generated samples. This is also verified by the distributions of as can be clearly seen in Figure 11 (c).

To investigate discriminator’s performance on the testing set, we show the probability distributions of and for the setting in Figure 11, where represents test set. It is seen that for and , the two distributions are more similar to each other than that of . In the case of , the discriminator behaves significantly differently between the training data and testing data, this means that overfitting has occurred and results in a drop in performances. In summary, Figure 11 and Figure 11 explain the performance drop for setting in experiment groups and .

Furthermore, we monitor the statistics of for the settings in group to explain why affects the behaviors of SR-GANs as in Figure 8. The probability distributions of for the setting are shown in Figure 11. We can see that for all the values, the probability distributions of the discriminator output for the training and testing data agree well with each other, indicating no overfitting has occurred.

Although there is no systematic method for determining the best value for different settings, our experiences is that setting seems to work well. In a series of extensive experiments we conducted, setting , SR-GANs always outperform SN-GANs and very importantly, we have not yet observed mode collapse.

5 Conclusions

In this paper, we monitor spectral distributions of the discriminator’s weight matrices in SN-GANs. We discover that when mode collapse occurs to a SN-GAN, a large number of its weight matrices singular values will drop to very small values, and we introduce the concept of spectral collapse to describe this phenomenon. We have provided strong evidence to link mode collapse with spectral collapse. Based on such link, we have successfully developed a spectral regularization technique for training GANs. We show that by compensating the spectral distributions of the weight matrices, we can successfully prevent spectral collapse which in turn can successfully prevent mode collapse. In a series of extensive experiments, we have successfully demonstrated that preventing spectral collapse can not only avoid mode collapse but also can improve GANs performances.

References

Appendix A Proof of Corollary 1

Corollary 1. Let and be two distributions in , a compact metric space. A linear and 1-Lipschitz constrained function , is the optimal solution of . Then all the singular values of the weight matrix are 1.

Proof: Proposition 1 in Gulrajani et al., Improved Training of Wasserstein GANs, Advances in Neural Information Processing Systems, 5769-5779, 2017. has proven that the optimal solution to 1-Lipschitz discriminator function has gradient norm 1 almost everywhere. In other words, is obtained at the upper bound of 1-Lipschitz constraint: .

Because is a linear function: . The 1-Lipschitz constraint for can be expressed as:

(11)

Equation 11 is equivalent to:

(12)

and,

(13)

where columns of , are eigenvectors of , and diagonal entries of diagonal matrix are eigenvalues of .

Taking , then

(14)

where is the -th eigenvalue, and is the -th element of .

Because is symmetric, , then

(15)

Finally, is equivalent to . We can see that the upper bound of 1-Lipschitz constraint can be obtained only when all eigenvalues of are 1. In other words, all the singular values of are 1.

Appendix B Architecture and Optimization Settings

In this paper, we employ SN-GAN architecture, which is illustrated in Figure 12. The weight in the convolutional layer is in the format [, , , ], where is the output channel, represents the input channel, and are kernel sizes. Particularly, there are 10 convolutional layers () in discriminator network, and CH in Figure 12(b) corresponds to channel size of discriminator function in main text, where extensive experiments are conducted with different settings of CH. All the experiments are conducted based on the following architecture. Image generation on STL-10 shares the same architecture with that on CIFAR-10. Thus, images in STL-10 are compressed to 32 32 pixels, identical to the resolution of images in CIFAR-10. The purpose is to to evaluate how different data affect mode collapse and spectral distribution, regardless of the effect of architecture.

The optimization settings follow SN-GANs. To be specific, the learning rate is taken as 0.0002, the number of updates of the discriminator per one update of the generator is 5, the batch size is taken as 64, and Adam optimizer is used as the optimization with the first and second order momentum parameters as 0 and 0.9, respectively.

(a) Generator architecture
(b) Discriminator architecture
Figure 12: Architecture of GANs.

Appendix C Spectral Distributions

In Figure 13 Figure 17, we show the spectral distributions for different settings in . We can see that spectral collapse and mode collapse always go side by side. In Figure 18 Figure 22, we show the spectral distributions of each layer except and . Because and act as the role of skip connection, and the intense correlation between mode collapse and spectral distortion is not observed in and .

(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
Figure 13: Spectral distributions in for settings in group . No mode collapse and spectral collapse is observed in group .
(a)
(b)
(c)
(d)
(e)
(f)
Figure 14: Spectral distributions in for settings in group . Mode collapse and spectral collapse are observed in setting , and .
(a)
(b)
(c)
(d)
(e)
Figure 15: Spectral distributions in for settings in group . Mode collapse and spectral collapse are observed in all settings of group .
(a)
(b)
(c)
Figure 16: Spectral distributions in for settings in group . No mode collapse and spectral collapse is observed in group .
(a)
(b)
(c)
(d)
(e)
Figure 17: Spectral distributions in for settings in group . Mode collapse and spectral collapse are observed in setting and .
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
Figure 18: Spectral distributions (after 50k iterations) in each layer for settings in group . No mode collapse and spectral collapse is observed in group .
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
Figure 19: Spectral distributions (after 50k iterations) in each layer for settings in group . Mode collapse and spectral collapse are observed in setting , and .
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
Figure 20: Spectral distributions (after 50k iterations) in each layer for settings in group . Mode collapse and spectral collapse are observed in all settings of group .
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
Figure 21: Spectral distribution (after 50k iterations) in each layer for settings in group . No mode collapse and spectral collapse is observed in group .
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
Figure 22: Spectral distribution (after 50k iterations) in each layer for settings in group . Mode collapse and spectral collapse are observed in setting and .

Appendix D Statistics of and Discriminator Objective

In the main text, we primarily show the statistics of and discriminator objective for setting and . In Figure 23 Figure 24, we show the mean and variance of and . To be specific, we feed the discriminator function with generated data and data in the training set, and obtain the output of the discriminator objective, then we calculate its mean and variance, finally we show their variation with in Figure 23. To investigate the performance of discriminator on test set, we monitor and , where , represent the training and test set, respectively. Then, we calculate the mean and variance, and show the variation with in Figure 24.

In Figure 23, we can see that discriminator objective has a decreasing tendency with the increase of . As we can see in Figure 24, and diverge when is excessively large. Thus, excessively increasing potentially leads to over-fitting, especially when is taken as . As we can see in Figure 24 (d) Figure 24 (f), and agree well, indicating that no over-fitting is observed in group .

(a)
(b)
(c)
(d)
(e)
(f)
(g)
Figure 23: Mean and variance of .
(a)
(b)
(c)
(d)
(e)
(f)
(g)
Figure 24: Mean and variance of .

Appendix E Synthetic Images

We show some examples generated by SN-GANs and SR-GANs in Figure 25 Figure 33

Figure 25: Synthetic images.
Figure 26: Synthetic images.
Figure 27: Synthetic images.
Figure 28: Synthetic images.
Figure 29: Synthetic images.
Figure 30: Synthetic images.
Figure 31: Synthetic images.
Figure 32: Synthetic images.
Figure 33: Synthetic images.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
388284
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description