Transforming the output of GANs by fine-tuning them with features from different datasets

Transforming the output of GANs by fine-tuning them with features from different datasets

Terence Broad
Department of Computing
Goldsmiths, University of London
t.broad@gold.ac.uk
Mick Grierson
Creative Computing Institute
University of the Arts London
m.grierson@arts.ac.uk
Abstract

In this work we present a method for fine-tuning pre-trained GANs with features from different datasets, resulting in the transformation of the output distribution into a new distribution with novel characteristics. The weights of the generator are updated using the weighted sum of the losses from a cross-dataset classifier and the frozen weights of the pre-trained discriminator. We discuss details of the technical implementation and share some of the visual results from this training process.

1 Introduction

The motivation for this work was to find a way of transforming a generative model that had been trained on one distribution, to output a completely new distribution of images that did not model an existing dataset. We approached this by taking the generator from a pre-trained generative adversarial network (GAN) goodfellow2014generative trained on one dataset (in this case ImageNet deng2009imagenet) and then fine-tuned it with features from another dataset using a classifier trained on data from both datasets.

With this approach we were hoping not to simply model the distribution of images in the new dataset, but transform the generator so it outputs a new distribution of images that fuses visual features from both datasets, resulting in a distribution with novel characteristics. By starting from a pre-trained model with good initial weights, we hoped that this would preserve some aspects of the original distribution, such as the spatial structure of the images, but instilling it with some new characteristics from the other dataset.

2 Method

We created a dataset of approximately 14k images from Pinterest boards with the title a e s t h e t i c.111See Figure 2 in the Appendix for samples. Images from these boards can usually be characterised by having distinct, washed-our colour palettes (often with only one dominant colour in the image) and often the photographs are framed with no particular subject in focus.

We trained a binary classifier to classify between the a e s t h e t i c images222We also trained classifiers for other datasets with prominent aesthetic characteristics, but for posterity, we will only be discussing results from fine-tuning with the classifiers trained on the a e s t h e t i c dataset. and images from the ImageNet dataset deng2009imagenet. To train the classifier we fine-tuned a pre-trained ResNet he2016deep model that had been trained to weakly classify Instagram hastags and then ImageNet mahajan2018exploring. In addition to training the classifier to classify a e s t h e t i c images and ImageNet images as separate classes (contrastive features), we also—initially by accident—trained a classifier that classifies them as being in the same class (joint features), which led to significantly better results when used for fine-tuning the generator (see Section 3 for further discussion).

After training the cross-dataset classifier, we used this model to fine-tune the weights of a pre-trained BigGAN brock2018large generator trained on the ImageNet dataset at a resolution of 128x128 pixels.333For this we used ‘The author’s officially unofficial PyTorch BigGAN implementation’ https://github.com/ajbrock/BigGAN-PyTorch and would like to thank the authors of the repository, Andrew Brock and Alex Andonian, for releasing the model weights for the discriminator as well as the generator, without which this work would not have been possible. We also used the frozen weights of the discriminator in the fine-tuning training procedure, updating the weights of the generator based on a weighted sum of the loss from the discriminator and the cross-dataset classifier (see Figure 1 for details). During this fine-tuning process, the networks are not exposed to any new training data, all the samples and losses are produced only using the pre-trained networks.

The process of training and convergence is very rapid. Usually within 1000 iterations (using a batch size of 9) the generator has converged onto a configuration of the weights that satisfies both the cross-dataset classifier and the discriminator. However we find that the best results were achieved using early stopping, often the most interesting visual results occurred when training was stopped after 300-600 iterations. Because training time is so quick, it is trivial to try multiple configurations of the parameter weighting and manually compare the visual results.

Generator

Sample Batch

Frozen Classifier

Frozen Discriminator

Weighted Sum of Losses

update generator weights
Figure 1: Diagram of training process: Batches of images are sampled from the pre-trained generator, which are fed to the cross-dataset classifier and the pre-trained discriminator (both of which have their weights frozen). The weights of the generator are updated based on a weighted sum of the losses from the classifier and discriminator.

3 Discussion and Conclusion

In the process of this work we have happened upon a number of surprising results. The manner in which features get combined from the different datasets was highly unexpected. Neither the results of fine-tuning using the contrastive features or the joint features classifier have resulted in producing images that resemble the images in either the ImageNet or a e s t h e t i c datasets.

The second surprising result is that when fine-tuning with the joint features classifier the visual results were much richer and varied (almost dreamlike in nature) than the results from fine-tuning with the contrastive features classifier (see Figures 4 and 5 in the Appendix for a detailed comparison). We speculate that the contrastive features classifier discards a lot of important features from the ImageNet distribution, so when the generator is fine-tuned, there are less combinations of features that can be used and the resulting distribution has a lot less variety.

In future research, we hope to find ways of having more control over what kind of characteristics from the different datasets get combined in the fine-tuning process, be that characteristics relating to aesthetic qualities, the structure and form in the images, or the stylistic qualities of a given dataset. We also hope to apply these techniques to higher resolution GAN models, but without having access to pre-trained discriminators, it is currently not possible to apply these techniques to the higher resolution generative models that have been made publicly available without retraining the models from scratch.

Acknowledgments

This work has been supported by UK’s EPSRC Centre for Doctoral Training in Intelligent Games and Game Intelligence (IGGI; grant EP/L015846/1).

References

Figure 2: Samples of images from the a e s t h e t i c dataset sourced from Pinterest.
Figure 3: Original BigGAN output (each row shows one interpolation between two classes). See Figures 4 and 5 for results of fine-tuning.
Figure 4: Output of BigGAN fine-tuned with contrastive features classifier for 300 iterations. See Figure 3 for reference of original BigGAN output.
Figure 5: Output of BigGAN fine-tuned with joint features classifier for 300 iterations. See Figure 3 for reference of original BigGAN output.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
393230
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description