GANILLA: Generative Adversarial Networks for Image to Illustration Translation
In this paper, we explore illustrations in children’s books as a new domain in unpaired image-to-image translation. We show that although the current state-of-the-art image-to-image translation models successfully transfer either the style or the content, they fail to transfer both at the same time. We propose a new generator network to address this issue and show that the resulting network strikes a better balance between style and content.
There are no well-defined or agreed-upon evaluation metrics for unpaired image-to-image translation. So far, the success of image translation models has been based on subjective, qualitative visual comparison on a limited number of images. To address this problem, we propose a new framework for the quantitative evaluation of image-to-illustration models, where both content and style are taken into account using separate classifiers. In this new evaluation framework, our proposed model performs better than the current state-of-the-art models on the illustrations dataset. Our code and pretrained models can be found at https://github.com/giddyyupp/ganilla.
keywords:Generative Adversarial Networks, Image to Image Translation, Illustrations, Style Transfer
Image-to-image style transfer has received increasing attention since Gatys et al.’s pioneering work Gatys et al. (2015). Researchers have developed various approaches for the problem, including paired Gatys et al. (2015, 2016) and unpaired Zhu et al. (2017); Yi et al. (2017); Chen et al. (2018) transfer, optimization based online methods Gatys et al. (2015); Berger and Memisevic (2016); Li et al. (2017a); Risser et al. (2017) and offline methods based on convolutional neural networks (CNN) Ulyanov et al. (2016a); Johnson et al. (2016); Li and Wand (2016b); Li et al. (2017b) or generative adversarial networks (GAN) Zhu et al. (2017); Yi et al. (2017); Chen et al. (2018); Choi et al. (2018); Karacan et al. (2016); Isola et al. (2017). Style transfer has been applied to various domains including to transform natural images to art paintings Lee et al. (2018); Zhu et al. (2017); Cho et al. (2019); Amodio and Krishnaswamy (2019) and reverse Tomei et al. (2019); Amodio and Krishnaswamy (2019), to transfer the style of a certain animal to another animal Zhu et al. (2017); Liu et al. (2019b, 2017), or to transform certain properties (e.g. weather, season of the year) of the scene Karacan et al. (2016); Lee et al. (2018); Liu et al. (2017); Huang et al. (2018b), or to transfer aerial images to maps Isola et al. (2017), or to transfer Chen and Hays (2018); Alharbi et al. (2019) or complete Liu et al. (2019a); Ghosh et al. (2019) sketches or to convert face images to sketches Yi et al. (2017) and for cartoonization Chen et al. (2018).
This paper aims to introduce a new domain to the style transfer literature: illustrations in children’s books (just “illustrations” for short from here on) (Fig. 4). We claim that this is a new domain because illustrations are qualitatively different than art paintings and cartoons. Illustrations do contain objects (e.g. mountains, people, toys, trees, etc.), yet the abstraction level might be much higher than that of paintings and cartoons. Existing models fall short in dealing with the complex balance between abstraction style and content of the illustrations. For example, despite the impressive results of CycleGAN in other domains Zhu et al. (2017); Wang et al. (2018), for illustrations we observed that the success at transferring the style of the artist is not reflected in transferring the content, due to the high level abstractions of the objects. On the other hand, most of the time DualGAN Yi et al. (2017) is successful at preserving the content of the image, but falls short transferring the style (see Figure 1). As empirically confirmed with our experiments, other state-of-the-art unpaired style transfer methods also fail to provide satisfactory results for illustrations.
Our goal is to develop a model that can generate appealing yet content preserving illustrations from a given natural image by transferring the style of a given illustration artist. To this end, we take the “unpaired” approach where two unaligned separate sets, one for the source domain (natural images) and one for the target (illustrations) are needed. We built upon our existing illustrations dataset Hicsonmez et al. (2017) that was used to classify illustrators and almost doubled its size to create the most extensive illustrations dataset. Our dataset contains 9448 illustrations coming from 363 different books and 24 different artists.
To address the issue of balancing style versus content, we propose two changes to the current state-of-the-art models. First, we propose a new generator network that down samples the feature map at each residual layer. Second, to better transfer the content, we propose to merge low level features with high level features by using skip connections and upsampling. Low level features generally contain edge like information which helps the generated image to preserve the structure of the input image. Our ablation experiments confirmed the effectiveness of these proposals.
One major problem with the unpaired style transfer approach is the evaluation part. Typically, the success of image-to-image translation models are evaluated qualitatively using a limited number of images or via user studies. Although there are some quantitative metrics Borji (2019) for other image generation tasks, it is not possible to use them in this domain directly since there is no paired ground-truth for generated images. We argue that the evaluation must consider both the style and content aspects simultaneously. To this end, we propose a quantitative evaluation framework that is based on content and style classifiers and experimentally show that it produces reasonable results.
In summary, our paper makes contributions at three different levels of the style transfer problem: dataset, architecture and evaluation measures. Specifically,
We explore illustrations in children’s books as a new domain in the image-to-image style and content transfer.
We present the most extensive illustrations dataset with almost 9500 illustrations of 24 artists.
We propose a novel generator network which strikes a good balance between style and content.
We propose a new framework to quantitatively evaluate image generation models in terms of content and style. Using this new evaluation measure, our proposed model performs better than current state-of-the-art models on the most extensive illustrations dataset available.
2 Related Work
It is possible to divide the current best practices for image style transfer into two. First approach, Neural Style Transfer (NST), uses CNNs to stylize an input image with the given style image which could be a painting or an illustration. The second approach utilizes GANs to synthesize stylized images. There are very comprehensive survey papers for both CNN Jing et al. (2017) and GAN Huang et al. (2018a) based methods. Here we summarize the GAN based unpaired image-to-image translation methods which are the most relevant group of work to our model.
GANs Goodfellow et al. (2014a); Zhao et al. (2016) are extensively used to generate images where generated image could be a handwritten digit Goodfellow et al. (2014a), a bedroom image Radford et al. (2015) or be conditioned with a text Karacan et al. (2016). In the image-to-image translation domain, very successful methods have been proposed Isola et al. (2017); Chen et al. (2018); Zhu et al. (2017); Yi et al. (2017); Kim et al. (2017); Choi et al. (2018). There are two major groups of methods in this category: paired and unpaired.
The “paired” group takes the conventional supervised learning approach, where explicit input-target pairs are needed. A prominent example in the paired group is the Pix2pix Isola et al. (2017) model which uses different image pairs for various tasks such as “semantic labels to street scene” or “day to night.” Collecting such datasets requires too much effort. To overcome this disadvantage, the second group requires unpaired image sets Zhu et al. (2017); Yi et al. (2017); Chen et al. (2018); Choi et al. (2018). These methods use two separate image collections, i.e. one for the source domain and one for the target domain, without explicitly pairing any two images.
CycleGAN Zhu et al. (2017) and DualGAN Yi et al. (2017) are pioneering works in the unpaired image-to-image translation approach. They both utilize a cyclic framework which consists of a couple of generators and discriminators. First couple learns a mapping from the source to target, while the second one learns a reverse mapping. In CycleGAN, they successfully tackle various tasks such as converting natural images to paintings, apples to oranges, and horses to zebras. In DualGAN, they successfully convert sketches to faces and day photos to photos at night.
On the other hand, CartoonGAN Chen et al. (2018) presents a new training framework to replace the cyclic structure of CycleGAN and introduces new loss functions. In their framework, content information is extracted from different high level feature maps of the VGG network. They used these feature maps to feed their loss function called the semantic content loss. As the target image set, they use edge smoothed fake cartoon images and unprocessed cartoon images as well. These images are very different in terms of both content and style from the illustrations in our dataset.
In our preliminary experiments on image-to-illustration problem, we observed that the current unpaired image-to-image translation models Zhu et al. (2017); Yi et al. (2017); Chen et al. (2018) fail to transfer style and content at the same time. We designed a new generator network in such a way that it preserves the content and transfers the style at the same time. In the following, we first describe our model in detail and then compare its structure with three state-of-the-art models. We also present the two ablation models which guided our design.
3.1 Details of GANILLA
A high-level architectural description of our generator network, GANILLA, is presented in Figure 3 along with the current state-of-the-art models and our two ablation models. GANILLA utilizes low-level features to preserve content while transferring the style. Our model consists of two stages (Figure 2): the downsampling stage and the upsampling stage. The downsampling stage is a modified ResNet-18 He et al. (2016) network with the following modifications: In order to integrate low-level features at the downsampling stage, we concatenate features from previous layers at each layer of downsampling. Since low-level layers incorporate information like morphological features, edge and shape, they ensure that transferred image has the substructure of the input content.
More specifically, the downsampling stage starts with one convolution layer having kernel followed by instance norm Ulyanov et al. (2016b), ReLU and max pooling layers. Then continues with four layers where each layer contains two residual blocks. Each residual block starts with one convolution layer followed by instance norm and ReLU layers. Then, one convolution and an instance normalization layer follow. We concatenate the output with residual block input. Finally, we feed this concatenated tensor to the final convolution and ReLU layers. We halve feature map size in each layer except Layer-I using convolutions with stride of . All convolution layers inside the residual layers have kernels.
In the upsampling stage, we feed lower-level features using the outputs of each layer in downsampling stage through long, skip connections (blue arrows in Figure 2) to the summation layers before the upsampling (Nearest Neighbor) operations. These connections help preserve the content. In detail, upsampling stage contains four consecutive convolution, upsample and summation layers. First, output of Layer-IV is fed through convolution and upsample layers to increase feature map size in order to match feature size of previous layer. Convolution and upsample operations are conducted on consecutive layer’s outputs. All convolution filters in upsample stage have kernels. Finally, one convolution layer with kernel is used to output 3 channel translated image.
Our discriminator network is a PatchGAN which is used for successful image to image translation models Zhu et al. (2017); Isola et al. (2017); Ledig et al. (2017); Li and Wand (2016a). It consists of three blocks of convolutions where each block contains two convolution layers. We start with filter size of 64 for the first block, and double it for each consecutive block.
We follow the idea of cycle-consistency Zhu et al. (2017); Yi et al. (2017) to train our GANILLA model. Specifically, there are two couples of generator-discriminator models. The first set (G) tries to map source images to target domain, while the second set (F) takes input as the target domain images and tries to generate source images in a cyclic fashion.
Our loss function consists of two Minimax losses Goodfellow et al. (2014b) for each Generator and Discriminator pair, and one cycle consistency loss Zhu et al. (2017); Yi et al. (2017). Cycle consistency loss tries to ensure that a generated sample could be mapped back to source domain. We use distance for cycle consistency loss. When we give source domain images to generator F , we expect no change on them since they already correspond to source domain. A similar situation applies when we feed the generator G with target domain images. This technique is first introduced by Taigman et al. Taigman et al. (2017) and used by CycleGAN Zhu et al. (2017) for image to painting translation tasks. We also use this identity paradigm (identity loss) with distance function in our experiments. Our full objective function is to minimize the sum of these four loss functions. For the details of the loss functions refer to Zhu et al. (2017).
We used PyTorch Paszke et al. (2017) to implement our models. As in other unpaired image-to-image transfer settings, GANILLA does not need paired images but two different image datasets, one for source and the other for target. For this dataset we use natural images as source domain, and illustration images as target domain. The details of the datasets will be provided in Section 4. All train images (i.e. natural images and illustrations) are resized to pixels. We train our models for 200 epoch using Adam solver with a learning rate of 0.0002. All networks were trained from scratch (i.e. not initialized with Imagenet weights). We conducted all our experiments on Nvidia Tesla V100 GPUs.
3.2 Comparison with other models
In Figure 3, we summarized current state-of-the-art image to image single target translation GAN generator networks, our model and ablation models at high level architectural description. As it could be seen from the figure as a common approach convolutional layers for downsampling and deconvolutional layers for upsampling are used. In addition to those downsampling and upsampling layers in DualGAN, CycleGAN and CartoonGAN have additional flat residual layers with additive connections between downsampling and upsampling layers. In our generative model, we use residual layers with concatenative connections, and upsampling operations instead of deconvolutional layers. In our ablation models, we aimed to understand the effect of low-level features for both downsampling and upsampling parts. Our first ablation model is designed to observe how important the low-level features are for the downsampling part. For this purpose, we replaced the concatenative connections with additive connections. We designed a second ablation model to test the effectiveness of low-level features for the upsampling part. In this ablation study, we replaced GANILLA’s upsampling layers with the block of upsampling layers of CycleGAN.
For training, we use 5402 natural images from CycleGAN training dataset Zhu et al. (2017) as the source domain, and we presented a new illustration dataset as the target domain. In testing, we use 751 images from CycleGAN test set.
For the illustration dataset, we extended the dataset in Hicsonmez et al. (2017) with new images almost doubling the original size. This extended dataset contains almost 9500 illustrations coming from 363 different books and 24 different artists. In order to train GAN models better, we increased the number of images for almost all illustrators. We collected new images by scraping the web and scanning books from public libraries. Our illustration dataset could be reproduced by scraping web based open libraries. In this study, we use a subset of this dataset consisting illustrations from 10 artists who draw full page and complex scenes (see Table 1 for numbers and Figure 4 for sample illustrations). We use the remaining 14 illustrators in our proposed quantitative evaluation framework in Section 5.2. From now on, illustrators will be referred with their initials.
Our code, pretrained models and the scripts that reproduce the dataset can be found at https://github.com/giddyyupp/ganilla.
We compared our method with three state-of-the-art GAN methods that use unpaired data: CartoonGAN Chen et al. (2018), CycleGAN Zhu et al. (2017) and DualGAN Yi et al. (2017). We used their official implementations that are publicly available. In the following, first we provide qualitative results and results of a user study (see Section 5.1). The main difficulty in comparing the GAN methods is the lack of quantitative evaluations. In this study, we present new measures to handle this issue. Basically, there are two main factors which determine the quality of the GAN generated illustrations; 1) having target style and 2) preserving the content. We propose a new framework to evaluate GANs quantitatively. We introduce two CNNs, one for style and one for content. Style-CNN aims to measure how well the results are in terms of style transfer. On the other hand, Content-CNN aims to detect whether input content is preserved or not. Details of these two networks will be described in Section 5.2.
In Table 2 we compare GANILLA with other methods in terms of number of parameters and train time. We run CartoonGAN with a batch size of 4, and the other three methods with a batch size of 1. All models are trained for 200 epochs.
|No of Params (Mil)||11.1||11.4||54.1||7.2|
|Train Time (sec)||1400||1347||710||887|
|Input||AS / PP||DM / RC||KH / SC||KP / SD||MB / TR|
5.1 Qualitative Analysis and User Study
We present style transfer results of GANILLA in Figure 5. Using same input test image, GANILLA generates images with the style of unique artist. Although for some illustrators, generated images contain unobtrusive defects, most of them captures target style successfully.
Figure 6 presents sample outputs from four methods for different styles. Although CycleGAN captures the style of the illustrators well, it is hard to tell what the content is. Also it hallucinates things such as faces and objects from source illustrations, on the generated images. On the other hand, CartoonGAN and DualGAN preserve content but in many examples, they lack in transferring the style. GANILLA successfully generates stylized images while preserving content.
Visual inspection based qualitative analysis is subjective. To reduce the effect of subjectiveness, we conducted a user study
|Content & Style||54.0||56.9||61.3||65.0|
We collected a total of 66 survey inputs from 48 different users. In Table 3, we present user study results in terms of average accuracies for Style and Content detection tasks, and average ranking for visually appealing task.
In terms of style and content identifiability, GANILLA appeared to be the best method. Content evaluation (second row in Table 3) scores are similar across models, which show that the users are able to find the content successfully.
In the visually appealingness task, DualGAN exceeds GANILLA with a slight margin (last row in Table 3). The main reason for that is DualGAN fails to transfer style for some illustrators, and users generally pick these images as visually appealing since they look more natural.
|Style||Content CNN (%)||Style CNN (%)||Final Scores (%)|
|Cartoon GAN||Cycle GAN||Dual GAN||GAN ILLA||Cartoon GAN||Cycle GAN||Dual GAN||GAN ILLA||Cartoon GAN||Cycle GAN||Dual GAN||GAN ILLA|
5.2 Quantitative Analysis
To quantitatively evaluate the quality of style transfer, we propose to use a style classifier. In order to train a style specific classifier, we must detach the training images from their visual content while keeping the style in them. For this purpose, we randomly cropped small patches (i.e pixel patches) from illustration images and used these patches to train our style classifier, the Style-CNN. Our training set had 11 classes: 10 for the illustration artists and one for natural images. Our intuition in adding the 11th class is that, if a generated image lacks style, then it is more likely to be classified as a natural image. We used only generated images to test the classifier. Figure 8 presents some translation examples from CycleGAN and GANILLA to show that Style-CNN decisions are justifiable.
|High Style Error||Low Style Error||Correct w Low Conf||Correct w High Conf|
|Target style: PP|
|Target style: AS|
|High Cont. Error||Low Cont. Error||Correct w Low Conf||Correct w High Conf|
|Target Style: KP|
|Target Style: PP|
We argue that capturing the style is not sufficient if the content of the source is not preserved. For evaluating the content preservation, we propose a content classifier Content-CNN. Here we define content as belonging to a specific scene category (e.g. forest, street, etc.). We select ten outdoor classes, which are close to the content of natural image dataset, from SUN dataset Xiao et al. (2010). 4150 training and 500 testing images are used. As the negative class, we use the full illustrators dataset excluding the images corresponding to ten illustrators used in training. Our intuition is that, if we are able to preserve the content, then a natural image that is stylized in an illustrator’s style should still have the same content. For example, if we generate a mountain image in Korky Paul style, we should still be able to classify it as a mountain. If the generated image has lost its connection with the content, then it could be classified as an illustration, i.e. in the negative class. If the content is not preserved, it will be just an illustration with no specific definition of the scene. Figure 9 presents some translation examples from CycleGAN and GANILLA to show that Content-CNN decisions are justifiable.
|Style||Content CNN (%)||Style CNN (%)||Final Scores (%)|
|Model 1||Model 2||Model 1||Model 2||Model 1||Model 2|
Style-CNN and Content-CNN results are given in Table 4. We present classification accuracies for each illustrator separately and also provide the average of all results. To calculate a final score for all methods, we average content and style scores, given in the last row. GANILLA achieves best overall score compared to others. CycleGAN outperforms others in style, while DualGAN got the best content score. Since CycleGAN “hallucinates” random crops from input illustration images such as animal or person faces, its style accuracy is higher than other methods. On the other hand, CartoonGAN and DualGAN failed to learn specific styles (KP, MB and DM, KH, MB, RC, SC respectively), and their style score is lower than others. When the style is not transferred, the content is likely to be preserved, thus their content score is higher. Except SD, GANILLA gives consistently high scores for both content and style.
5.3 Ablation Experiments
In order to evaluate the effects of different parts of our model in detail, we conducted two ablation experiments. It is possible to partition our model into two parts, the first one is downsampling part and the second one is upsampling part. In our first ablation experiment, we replaced our downsampling CNN with the original ResNet-18 to see the effects of modifications.
Our second ablation model is composed by using our downsampling CNN with deconv layers for upsampling part similar to CycleGAN and CartoonGAN. In this model, output of our downsampling CNN (only last layer) is fed to a series of deconvolution layers. This model aims to measure the effects of using multiple feature layers for the upsampling part.
In our ablation experiments, we trained models with the same ten illustrator styles, but present results for randomly selected five of them since other results are inline. Visual results of all the ablation experiments are given in Figure 10. In Table 5 we present style and content classifier results for ablation experiments.
Ablation Model 1 gives similar content score with GANILLA, but its style score is lower. This shows that our modifications to original ResNet-18 architecture enabled GANILLA to successfully stylize input images. On the other hand, Ablation Model 2 achieves better style score than GANILLA but its content score is too low. This demonstrates that using low level features in the up-sampling part helps to preserve content.
5.4 Cartoonization Using Hayao Style
In Chen et al. (2018), dataset is constructed by sampling images from the videos of stories from Miyazaki Hayao. Since, the dataset was not publicly available we have tried repeating the same procedure. However, we were not able to replicate the results of CartoonGAN on our collection due to the low quality of the samples. In order to be able to perform experiments on the same illustrator with CartoonGAN, instead, we have collected images of Miyazaki Hayao using Google Image Search to be used as our target training set. Please note that, this is a more challenging dataset compared to the Hayao dataset used in Chen et al. (2018) which was composed of a unique style from a single story (Spirited Away). Our images correspond to the samples from the entire collection of Hayao and therefore mixture of several styles from a variety of stories.
In Figure 11, we show samples from our Hayao collection. Note that, compared to the illustrator dataset presented in our study, abstraction in Hayao dataset is very limited. Illustrations of Hayao mostly correspond to natural scenery, and therefore are more similar to the input images used in the tests.
We only present visual results for the cartoonization and the results of our method, GANILLA, shown in Figure 12. The results show the effectiveness of GANILLA in the cartoonization domain. Especially, since the color green is dominant in source images, transferred images also contain greenish background.
5.5 Comparisons with Neural Style Transfer
We compare our method with Gatys et al. Gatys et al. (2016) NST method in Figure 13. For Gatys et al.NST model, we use one content and one style image. As it could be seen from the figure, NST method is not successful in terms of stylization on illustration dataset. Since NST uses only one source and style image, its result is highly depends on the selected input images. Although NST transfers color information correctly, it falls short of dealing with the style.
|Content Image||Style Image||Gatys et al.||GANILLA|
5.6 Limitations and Discussion
Our model fails for some illustrators in terms of stylization. In Figure 14, we provide an example case where style illustrator is Dr. Seuss. Main reason for this failure is that Dr. Seuss’ illustrations are mostly charcoal drawings or contain simple colorings.
Here, we would like to note that the illustration dataset presented in this study has several challenges compared to the datasets used in other style transfer studies.
For example, for the dataset used in CartoonGAN, each style is obtained from the images of a single video of a story
In this paper, we presented the most extensive children’s books illustration dataset and a new generator network for image-to-illustration translation. Since children’s book illustrations contain highly abstract objects and shapes, current state-of-the-art generator networks fail to transfer the content and style at the same time. To overcome this issue, our model uses low level features in downsampling state as well as in the upsampling part.
One major problem in the image-to-image translation domain is that there are no well defined or agreed-upon metrics to evaluate a generator model. To address this problem, we proposed a new evaluation framework which quantitatively evaluates image-to-image translation models, in terms of both content and style. Our framework is based on two CNNs which measure the style and content aspects separately. Using this framework, our proposed model, GANILLA, achieves the best overall performance compared to the current state-of-the-art models.
The numerical calculations reported in this paper were fully performed at TUBITAK ULAKBIM, High Performance and Grid Computing Center (TRUBA resources).
- Readers could reach our study at http://126.96.36.199
- We were not able to use the datasets in CartoonGAN since it was not publicly available
- Latent filter scaling for multimodal unsupervised image-to-image translation. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 1458–1466. Cited by: §1.
- TraVeLGAN: image-to-image translation by transformation vector learning. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 8983–8992. Cited by: §1.
- Incorporating long-range consistency in cnn-based texture generation. arXiv preprint arXiv:1606.01286. Cited by: §1.
- Pros and cons of gan evaluation measures. Computer Vision and Image Understanding 179, pp. 41–65. Cited by: §1.
- SketchyGAN: towards diverse and realistic sketch to image synthesis. In IEEE Conference on Computer Vision and Pattern Recognition, Cited by: §1.
- CartoonGAN: generative adversarial networks for photo cartoonization. In IEEE Conference on Computer Vision and Pattern Recognition, Cited by: §1, §2, §2, §2, §3, §5.4, §5.
- Image-to-image translation via group-wise deep whitening-and-coloring transformation. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 10639–10647. Cited by: §1.
- Stargan: unified generative adversarial networks for multi-domain image-to-image translation. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 8789–8797. Cited by: §1, §2, §2.
- A neural algorithm of artistic style. arXiv preprint arXiv:1508.06576. Cited by: §1.
- Image style transfer using convolutional neural networks. IEEE Conference on Computer Vision and Pattern Recognition. Cited by: §1, Figure 13, §5.5.
- Interactive sketch & fill: multiclass sketch-to-image translation. In IEEE International Conference on Computer Vision, Cited by: §1.
- Generative adversarial nets. In Advances in Neural Information Processing Systems, pp. 2672–2680. Cited by: §2.
- Generative adversarial nets. In Advances in Neural Information Processing Systems, pp. 2672–2680. Cited by: §3.1.
- Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778. Cited by: §3.1.
- DRAW: deep networks for recognizing styles of artists who illustrate children’s books. In ACM International Conference on Multimedia Retrieval, pp. 338–346. Cited by: §1, §4.
- An introduction to image synthesis with generative adversarial nets. arXiv preprint arXiv:1803.04469. Cited by: §2.
- Multimodal unsupervised image-to-image translation. In IEEE European Conference on Computer Vision, pp. 172–189. Cited by: §1.
- Image-to-image translation with conditional adversarial networks. IEEE Conference on Computer Vision and Pattern Recognition. Cited by: §1, §2, §2, §3.1.
- Neural style transfer: a review. arXiv preprint arXiv:1705.04058. Cited by: §2.
- Perceptual losses for real-time style transfer and super-resolution. In IEEE European Conference on Computer Vision, pp. 694–711. Cited by: §1.
- Learning to generate images of outdoor scenes from attributes and semantic layouts. External Links: Cited by: §1, §2.
- Learning to discover cross-domain relations with generative adversarial networks. CoRR abs/1703.05192. Cited by: §2.
- Photo-realistic single image super-resolution using a generative adversarial network. In IEEE Conference on Computer Vision and Pattern Recognition, Cited by: §3.1.
- Diverse image-to-image translation via disentangled representations. In IEEE European Conference on Computer Vision, pp. 35–51. Cited by: §1.
- Precomputed real-time texture synthesis with markovian generative adversarial networks. In European Conference on Computer Vision, pp. 702–716. Cited by: §3.1.
- Precomputed real-time texture synthesis with markovian generative adversarial networks. In IEEE European Conference on Computer Vision, pp. 702–716. Cited by: §1.
- Laplacian-steered neural style transfer. In ACM International Conference on Multimedia, pp. 1716–1724. Cited by: §1.
- Diversified texture synthesis with feed-forward networks. In IEEE Conference on Computer Vision and Pattern Recognition, Cited by: §1.
- SketchGAN: joint sketch completion and recognition with generative adversarial network. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 5830–5839. Cited by: §1.
- Unsupervised image-to-image translation networks. In Advances in Neural Information Processing Systems, pp. 700–708. Cited by: §1.
- Few-shot unsupervised image-to-image translation. In IEEE International Conference on Computer Vision, Cited by: §1.
- Automatic differentiation in pytorch. Cited by: §3.1.
- Unsupervised representation learning with deep convolutional generative adversarial networks. CoRR abs/1511.06434. External Links: Cited by: §2.
- Stable and controllable neural texture synthesis and style transfer using histogram losses. arXiv preprint arXiv:1701.08893. Cited by: §1.
- Unsupervised cross-domain image generation. International Conference on Learning Representations. Cited by: §3.1.
- Art2Real: unfolding the reality of artworks via semantically-aware image-to-image translation. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 5849–5859. Cited by: §1.
- Texture networks: feed-forward synthesis of textures and stylized images.. In International Conference on Machine Learning, pp. 1349–1357. Cited by: §1.
- Instance normalization: the missing ingredient for fast stylization. CoRR abs/1607.08022. Cited by: §3.1.
- High-resolution image synthesis and semantic manipulation with conditional gans. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 8798–8807. Cited by: §1.
- SUN database: large-scale scene recognition from abbey to zoo. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 3485–3492. Cited by: §5.2.
- DualGAN: unsupervised dual learning for image-to-image translation. IEEE International Conference on Computer Vision. Cited by: Figure 1, §1, §1, §2, §2, §2, §3.1, §3.1, §3, §5.
- Energy-based generative adversarial network. arXiv preprint arXiv:1609.03126. Cited by: §2.
- Unpaired image-to-image translation using cycle-consistent adversarial networks. IEEE International Conference on Computer Vision. Cited by: Figure 1, §1, §1, §2, §2, §2, §3.1, §3.1, §3.1, §3, §4, §5.