GANHopper: Multi-Hop GAN for Unsupervised Image-to-Image Translation

GANHopper: Multi-Hop GAN for Unsupervised Image-to-Image Translation

Abstract

We introduce GANHopper, an unsupervised image-to-image translation network that transforms images gradually between two domains, through multiple hops. Instead of executing translation directly, we steer the translation by requiring the network to produce in-between images which resemble weighted hybrids between images from the two input domains. Our network is trained on unpaired images from the two domains only, without any in-between images. All hops are produced using a single generator along each direction. In addition to the standard cycle-consistency and adversarial losses, we introduce a new hybrid discriminator, which is trained to classify the intermediate images produced by the generator as weighted hybrids, with weights based on a predetermined hop count. We also introduce a smoothness term to constrain the magnitude of each hop, further regularizing the translation/ Compared to previous methods, GANHopper excels at image translations involving domain-specific image features and geometric variations while also preserving non-domain-specific features such as backgrounds and general color schemes.

\cvprfinalcopy

1 Introduction

Unsupervised image-to-image translation has been one of the most intensively studied problems in computer vision, since the introduction of domain transfer network (DTN) [19], CycleGAN [25], DualGAN [22], and UNIT [14] in 2017. While these networks and many follow-ups were designed to perform general-purpose translations, it is challenging for the translator to learn transformations beyond local and stylistic adjustments, such as geometry and shape variations. For example, typical dog-cat translations learned by CycleGAN do not transform the animals in terms of geometric facial features; only pixel-scale color or texture alterations take place.

When the source and target domains exhibit sufficiently large discrepancies, any proper translation function is expected to be complex and difficult to learn. Without any paired images to supervise the learning process, the search space for the translation functions can be immense. With large image changes, there are even more degrees of freedom to account for. In such cases, a more constrained and steerable search would be desirable.

Figure 1: What dog would look most similar to a given cat? Our multi-hop image translation network, GANHopper, produces such transformations, trained only on unpaired image domains. The key idea is to force the network to make gradual transitions, by generating multiple in-between images (i.e. “hops”) resembling weighted hybrids between the two domains. Direct translation methods can “undershoot the target” by failing to produce the necessary geometry variations [25] or “overshoot the target” by significantly altering non-domain-specific features such as backgrounds and general color schemes [10].

In this paper, we introduce an unsupervised image-to-image translator that is constrained to transform images gradually between two domains, e.g., cats and dogs. Instead of performing the transformation directly, our translator executes the task in steps, called hops. Our multi-hop network is built on CycleGAN [25]. However, we steer the translation paths by forcing the network to produce in-between images which resemble weighted hybrids between images from the two input domains. For example, a four-hop network for dog-to-cat translation produces three in-between images: the first is 25% cat-like and 75% dog-like, the second is 50/50, and the third is 75% cat-like and 25% dog-like. The fourth and final hop is a 100% translated cat.

Our network, GANHopper, is unsupervised and trained on unpaired images from two input domains, without any in-between hybrid images in its training set. Equally important, all hops are produced using a single generator along each direction, so the network has no more capacity than CycleGAN. To make training possible, we introduce a new hybrid discriminator, which is trained exclusively on real images (e.g., dogs or cats) to evaluate the in-between images by classifying them as weighted hybrids, depending on the prescribed hop count. In addition to the original cycle-consistency and adversarial losses from CycleGAN, we introduce two new losses: a hybrid loss to assess the degree to which an image belongs to one of the input domains, and a smoothness loss which further regulates the image transitions to ensure that a generated image in the hop sequence does not deviate much from the preceding image.

GANHopper does not merely transform an input cat into a dog — many dogs can fool the discriminator. Rather, it aims to generate the dog which looks most similar to the given cat; see Figure 1. Compared to previous unsupervised image-to-image translation networks, our network excels at image translations involving domain-specific image features and geometric variations (i.e., “what makes a dog a dog?”) while preserving non-domain-specific image features such as background and general color schemes, e.g., the fur color of the input cat in Figure 1.

The ability to produce large changes (in particular, geometry transformations) via unsupervised domain translation has been a hotly-pursued problem. There appears to be a common belief that the original CycleGAN/DualGAN architecture cannot learn geometry variations and must be modified at the feature representation or training-approach level. As a result, many approaches resort to latent space translations, e.g., with style-content [6] or scale [23] separation and feature disentanglement [21]. Our work challenges this assumption, as GANHopper follows fundamentally the same architecture as CycleGAN, working directly in image space; it merely enforces a gradual, multi-hop translation to steer and regulate the image transitions.

2 Related Work

The foundation of modern image-to-image translation is the UNet architecture, first developed for semantic image segmentation [18]. This architecture was later extended with conditional adversarial training to a variety of image-to-image translation tasks [7]. Further improvements led to the generation of higher-resolution outputs [20] and multiple possible outputs for the same image in “one-to-many” translation tasks, e.g. grayscale image colorization [26].

The above methods require paired input and output images as training data. A more recent class of image-to-image translation networks is capable of learning from only unpaired data in the form of two sets and of input and output images, respectively [25, 22, 10]. These methods jointly train a network to map from to and a network to map from to , enforcing at training time that and . Such cycle consistency is thought to regularize the learned mappings to be semantically meaningful, rather than arbitrary translations.

While the above approaches succeed at domain translations involving low-level appearance shift (e.g. summer to winter, day to night), they often fail when the translation requires a significant shape deformation (e.g. cat to dog). Cycle-consistent translators have been shown to perform larger shape changes when trained with a discriminator and perceptual loss function that consider more global image context [5]. An alternative approach is to interpose a shared latent code from which images in both domains are generated (i.e. and [14]. This method can also be extended to enable translation into multiple output images [6]. Another tactic is to explicitly and separately model geometry vs. appearance in the translation process. A domain-specific method for translating human faces to caricature sketches accomplishes this by detecting facial landmarks, deforming them, then using them to warp the input face [2]. More recent work has proposed a related technique that is not specific to faces [21]. Finally, it is also possible to perform domain translation via the feature hierarchy of a pre-trained image classification network [9]. This method can also produce large shape changes.

In contrast to the above, we show that direct image-to-image translation can produce large shape changes, while also preserving appearance details, if translation is performed in a sequence of smooth hops. This process can be viewed as producing an interpolation sequence between two domains. Many GANs can produce interpolations between images via linear interpolation in their latent space. These interpolations can even be along interpretable directions which are either specified in the dataset [11] or automatically inferred [4]. However, GAN latent space interpolation does not perform cross-domain interpolation. We are aware of one other work which performs cross-domain interpolation [1] by identifying corresponding points on images from two domains and using these points as input to standard image morphing approaches [12]. However, this approach requires images in both the source and target domain to interpolate between, whereas our method takes just a source image and produces the interpolation to the best-matching image in the target domain.

3 Method

Figure 2: Let and represent two domains that we wish to translate (dogs and cats, respectively, in this figure). Our approach warps images from to using the generator and from to using the generator by applying each generator times. The generator is trained by combining: (a) the adversarial loss, obtained by feeding the generated images, including the hybrid images, to either (from to ) or (from to ); (b) the reconstruction loss, which is the result of comparing a generated image, including hybrid images, or input with either if is being translated from to or if is being translated from to ; (c) a domain hybrid loss, a membership score to either class determined by evaluating every generated image with the hybrid discriminator , which is trained exclusively on real images to classify the input as being either a member of or .

Let and denote our source and target image domains, respectively. Our goal is to learn a transformation that, given an image , outputs another image such that is perceived to be the counterpart of the image in the dataset . The same must be achieved with the analog transformation from to . This task is identical to that performed by CycleGAN [25]. However, we do not translate the input image in one pass through the network. Rather, we facilitate the translation process via a sequence of intermediate images. We introduce the concept of a hop, which we define as the process of warping one image toward the target domain by a limited amount using a generator network. Repeated hops produce hybrid images as byproducts of the translation process.

Since we do not translate images in a single pass through a network, our training process must be modified from the traditional cycle-consistent learning framework. In particular, the generation of hybrid images during the translation is a challenge, because the training data does not include such images. Therefore, the hybrid-ness of these generated images must be estimated on the fly during training. To this end, we introduce a new discriminator, which we call the hybrid discriminator, whose objective is to evaluate how similar an image is to both input domains, generating a membership score. We also add a new smoothness term to the loss, whose purpose is to encourage a gradual warping of the images through the hops so that the generator does not overshoot the translation. The following subsections present our multi-hop framework and expand on these two key new elements.

3.1 Multi-hop framework

Our model consists of the original two generators from CycleGAN, denoted by and , and three discriminators, two of which are CycleGAN’s original adversarial discriminators and . The third discriminator is the new hybrid discriminator . Figure 2 depicts how these different generators and discriminators work together during training time to translate images via multiple hops.

Hop nomenclature.

A hop is defined as using either or to warp an image towards the domain or , respectively. A full translation is achieved by performing hops using the same generator, where is a user defined value. For instance, if , , where and . Similarly, , where and . Given an image , the translation hops are defined via the following recurrence relations:

Generator architecture

We adopt the architecture and layer nomenclature originally proposed by Johnson et al. [8] and used in CycleGAN. Let c7s1- denote a 77 Convolution-InstanceNorm-ReLU layer with filters and stride . d denotes a 33 Convolution-InstanceNorm-ReLU layer with filters and stride 2. Reflection padding was used to reduce artifacts. R denotes a residual block with two 33 convolutional layers, each with filters. u denotes a 33 TransposeConvolution-InstanceNorm-ReLU layer with filters and stride . The network takes 128128 images as input and consists of the following layers: c7s1-64, d128, d256, R256 (12), u128, u64, c7s1-3.

Discriminator architecture

For the discriminator networks , and , we use the same 7070 PatchGAN [7] used in CycleGAN. Let C denote a 44 Convolution-InstanceNorm-LeakyReLU layer with filters and stride . Unlike CycleGAN, we do not apply another convolution to produce a 1-dimensional output. Instead, given the 128128 input images, we produce a 1616 feature matrix. Each of its elements is associated with one of the 7070 patches from the input image. The discriminator consists of the following layers: C64, C128, C256, C512.

3.2 Training

Loss function

The full loss function combines the reconstruction loss, adversarial loss, domain loss and smoothness loss, denoted respectively as , , and :

We empirically define the values for the weights in the objective function as: , , , .

Cycle-consistency loss

Rather than enforcing cycle consistency between the input and output images, as in CycleGAN, we enforce it locally along every hop of our multi-hop translation. That is, should undo a single of hop of and vice versa. We enforce this property via a loss proportional to the difference between and for all hops (and symmetrically between and :

Adversarial loss

The generator tries to generate images that look similar to images from domain , while aims to distinguish between these generated images and real images . Note that “generated images” includes both final output images and in-between images. The discriminators use a least squares formulation [16]:

Hybrid loss

The hybrid term assesses the degree to which an image belongs to one of the two domains. For instance, if GANHopper is trained with hops, we desire that the first hop be judged as belonging 25% to domain and 75% to domain . Thus, we define the target hybridness score of hop to be ; conversely, it is defined as for the reverse hops . To encourage each hop to achieve its target hybridness, we penalize the distance between the target hybridness and the output of the hybrid discriminator on that hop. Since is also trained to output 0 for ground-truth images in and 1 for ground-truth images in (i.e. it is a binary domain classifier), an image for which produces an output of 0.25 can be interpreted as an image which the classifier is 25% confident belongs to domain —precisely the behavior we desire:

Smoothness loss

The smoothness term penalizes the image-space difference between hop and hop . This term encourages the hops to be individually as small as possible while still leading to a full translation when combined, which has a regularizing effect on the training:

Training procedure

We train each network in GANHopper one hop at a time, i.e. for each image to be translated, we perform a single hop, update the weights of the generator and discriminator networks, perform the next hop, etc. Training the network this way, rather than performing all hops and then doing a single weight update, has the advantage of requiring significantly less memory. The generator_update and discriminator_update procedures use a single term of the sums which define the loss (i.e. the term for hop ) to compute parameter gradients.

4 Results and Evaluation

Our network takes 128128 images as input and outputs images of the same resolution. Experiments were performed on an NVIDIA GTX 1080 Ti (using batch size 6) and an NVIDIA Titan X (batch size 24). We trained GANHopper  using Adam with a learning rate of . With the exception of the cat/human faces experiment, we trained all experiments for 100 epochs (cat/human mode collapsed after 25 epochs, so we report the results from epoch 22).

In our experiments, we used combinations of seven datasets, translating between pairs. Some translation pairs demand both geometric and texture changes:

  • 8,223 dog faces from the Columbia dataset [13]

  • 47,906 cat faces from Flickr100m [17]

  • 202,599 human faces from aligned CelebA [15]

  • The zebra, horse, summer, and winter datasets originally used to evaluate CycleGAN [25]

We compare GANHopper  with three prior approaches: CycleGAN [25], DiscoGAN [10] and GANimorph [5]. All three are “unsupervised direct image-to-image translation” methods, in that they transform the input image from one domain into the output image from another domain without mediation by any shared latent variables and without any prior pairing of samples between the two domains. We trained these baselines on the aforementioned datasets with their public implementation and with default settings.

Quantitative evaluation of translation accuracy

We quantitatively evaluate dog/cat translation using two metrics (Figure 3). First, we compute the percentage of output pixels that are classified as belonging to the target domain by a pre-trained semantic segmentation network (DeepLabV3 [3], trained on PASCAL VOC 2012). Second, we measure how well the output preserves salient features from the input using a perceptual similarity metric [24]. CycleGAN produces outputs that best resemble the input but fails to perform domain translation. Our approach outperforms both GANimorph and DiscoGAN on both metrics: it is slightly better at domain translation and considerably better at preserving input features. This result indicates that one need not sacrifice domain translation ability to preserve salient features of the input. Figure 4 shows how the percentage of pixels translated varies as a function of the number of hops performed. While not strictly linearly increasing, it is a smooth monotonic function, suggesting that our hybrid loss term successfully encourages in-between images which can be interpreted as domain hybrids. As shown in the supplementary material, our method also outperforms quantitatively the other methods on the human-to-cat dataset when both metrics described are considered.

Figure 3: Quantitative analysis of dog/cat translation. GANHopper was trained using hops. The horizontal axis is average perceptual similarity [24] of input to output. The vertical axis is the percentage of output pixels correctly labeled as the output class (e.g. dog or cat) by DeepLabV3 [3] trained on pascal PASCAL VOC 2012. Higher and to the right is better.
Figure 4: The average percentage of pixels classified as cat or dog (vertical axis) as a function of the number of hops performed (horizontal axis). GANHopper was trained to translate cats to dogs (and vice versa) using hops. Pixels classified with any label other than cat or dog are omitted. The hop corresponds to the raw inputs. The classification was performed using DeepLabV3 [3] trained on the PASCAL VOC 2012 dataset.
Figure 5: Comparing different translation methods on the challenging dog/cat faces dataset. We trained GANHopper  with four hops; (a) shows the result of hopping 1 to 4 times from the input and (b) shows the result of 8 hops from the input. We compare our results to (c) CycleGAN, (d) DiscoGAN, and (e) GANimorph.
Figure 6: Examples of human to cat faces translation. The approaches compared are (a) Our approach, (b) extra hops after the full translation, (c) CycleGAN and (d) GANimorph. We trained GANHopper with four hops and smoothness weight .

Qualitative Results

Figure 5 compares our method to the baselines on cat to dog and dog to cat translation. Our multi-hop procedure translates the input via a sequence of hybrid images (Figure 5(a)), allowing it to preserve key visual characteristics of the input if changing them is not necessary to achieve domain translation. For instance, fur colors and background textures are preserved in most cases (e.g. white cats map to white dogs) as is head orientation, while domain-specific features such as eyes, noses, and ears are appropriately deformed. The multi-hop procedure also allows control over how much translation to perform. The user can control the degree of “dogness” or “catness” introduced by the translation, including performing more hops than the network was trained on in order to exaggerate the characteristics of the target domain. Figure 5(b) shows the result of performing 8 hops using a network trained to perform only four. In the fifth row, the additional hops help to clarify the shape of the output dog’s tongue.

By contrast, the baselines produce less desirable results. CycleGAN preserves the input features too much, leading to incomplete translations (Figure 5(c)). Note that CycleGAN’s outputs often look similar to the first hop of our network; this makes sense, since each hop uses a CycleGAN-like generator network. Our network uses multiple hops of that same architecture to overcome CycleGAN’s original limitations. DiscoGAN (Figure 5(d)) can properly translate high-level properties such as head pose and eye placement but fails to preserve lower-level appearance details such as fur patterns and color. Its results are also often geometrically malformed (lines 2, 4, 5, 7, and 8). GANimporph (Figure 5(e)) produces images that are convincingly part of the target domain but preserve little of the input image’s features (typically only head pose). Note that all baselines produce outputs with noticeably decreased saturation and contrast, whereas our method preserves these properties.

Figure 6 shows a similar comparison on human to cat translation. Again, our method preserves input features well: facial structures stay roughly the same, and cats with light fur tend to generate blonde-haired people. Our method also preserves background details better than the baselines.

Figure 7: Impact of training hop count. Using hops (b) better preserves input features, but using hops (a) allows more drastic changes. Red squares denote the hops that correspond to a full translation in each setting; images further to the right are extrapolations obtained by applying additional hops.

Impact of training hop count

We also examine the impact of the number of hops used during training. A network using too few hops must more quickly change the domain of the image; this causes the generator to “force” the translation and produce undesirable outputs. In the summer to winter translation of Figure 7 Top, the hiker’s jacket quickly loses its blue color in the first row () compared with the second row (). In the winter to summer translation of Figure 7 Bottom, the lake incorrectly becomes green when using a two-hop network butt is preserved with four hops (while vegetation is still converted to green). The results suggest that increasing the number of hops has the added benefit of increasing image diversity and also allowing for smoother transition from one domain to another.

Figure 8: Evaluation of the impact of the smoothness term weight on the dog to cat dataset trained with hops. The figure shows fully translated dog-to-cat and cat-to-dog samples generated by GANHopper  trained with set as , and .

Impact of the smoothness term

Figure 8 demonstrates the impact of the smoothness weight on training dog to cat translation with 4 hops. preserves the original fur patterns in the cat-to-dog translation and the sharpness of the image in the dog-to-cat translation. With , the network collapses to producing cats with gray and white fur and noticeably blurry dogs. Higher values also help preserve the input background textures.

Figure 9: As with CycleGAN and GANimorph, our method occasionally “erases” part of an object and replaces it with background, rather than correctly translating it (e.g. the zebra legs disappear). This can be ameliorated, but not completely resolved, by increasing the smoothness loss weight .

Failure cases

As our method uses CycleGAN as a sub-component, it inherits some of the problems faced by that method, as well as other direct unpaired image translators. Figure 9 shows one prominent failure mode, in which the network “cheats” by erasing part of the object to be translated and replacing it with background (e.g. zebra legs). The smoothness term in our loss function penalizes differences between hops, so increasing its weight can help with this problem, but this issue remains unsolved in general.

5 Conclusion and Future Work

Unsupervised image-to-image translation is an ill-posed problem, and different methods have chosen different regularizing assumptions to define their solutions to it [21, 14, 6]. In this paper, we follow the cycle-consistency assumption of CycleGAN [25] and DualGAN [22], while introducing the multi-hop paradigm to exert fine-grained control over the translation using a new hybrid discriminator. Compared to other approaches, our GANHopper network better preserves features of the input image while still applying the necessary transformations to create an output that clearly belongs to the target domain.

The meta idea of “transforming images in small steps” raises new questions worth exploring. For example, how many steps are ideal? The results in this paper used 2-4 hops, as more hops did not noticeably improve performance but did increase training time. However, some images in a domain are clearly harder than others to translate into a different domain (e.g. translating dogs with long vs. short snouts into cats). Can we automatically learn the ideal number of hops for each input image? Taken to an extreme, can we use a very large number of tiny hops to produce a smooth interpolation sequence from source to target domain? We also want to identify domains where GANHopper systematically fails and explore the design space of multi-hop translation architectures in response. For instance, while GANHopper uses the same network for all hops, it may be better to use different networks per hop (i.e. the optimal function for translating a 25% dog to a 50% dog may not be the same as the function for translating a 75% dog to a 100% dog). Another interesting direction is to combine GANHopper with ideas from MUNIT [6] or BiCycleGAN [26], so that the user can control the output of the translation via a “style” code while still preserving important input features (e.g. translating a white cat into different white-furred dog breeds). Finally, we would like to further investigate the idea that initially spurred the development of GANHopper: generating meaningful extrapolation sequences beyond the boundaries of a given image domain, to produce creative and novel outputs.

References

  1. K. Aberman, J. Liao, M. Shi, D. Lischinski, B. Chen and D. Cohen-Or (2018-07) Neural best-buddies: sparse cross-domain correspondence. ACM Trans. Graph. 37 (4). Cited by: §2.
  2. K. Cao, J. Liao and L. Yuan (2018) CariGANs: unpaired photo-to-caricature translation. Cited by: §2.
  3. L. Chen, G. Papandreou, F. Schroff and H. Adam (2017) Rethinking atrous convolution for semantic image segmentation. CoRR abs/1706.05587. Cited by: Figure 3, Figure 4, §4.
  4. X. Chen, Y. Duan, R. Houthooft, J. Schulman, I. Sutskever and P. Abbeel (2016) InfoGAN: interpretable representation learning by information maximizing generative adversarial nets. In Proceedings of the 30th International Conference on Neural Information Processing Systems, Cited by: §2.
  5. A. Gokaslan, V. Ramanujan, D. Ritchie, K. I. Kim and J. Tompkin (2018) Improving shape deformation in unsupervised image-to-image translation. CoRR abs/1808.04325. External Links: Link, 1808.04325 Cited by: §2, §4.
  6. X. Huang, M. Liu, S. Belongie and J. Kautz (2018) Multimodal unsupervised image-to-image translation. In ECCV, Cited by: §1, §2, §5, §5.
  7. P. Isola, J. Zhu, T. Zhou and A. A. Efros (2016) Image-to-image translation with conditional adversarial networks. CoRR abs/1611.07004. External Links: Link, 1611.07004 Cited by: §2, §3.1.
  8. J. Johnson, A. Alahi and F. Li (2016) Perceptual losses for real-time style transfer and super-resolution. CoRR abs/1603.08155. External Links: Link, 1603.08155 Cited by: §3.1.
  9. O. Katzir, D. Lischinski and D. Cohen-Or (2019) Cross-domain cascaded deep feature translation. CoRR abs/1906.01526. Cited by: §2.
  10. T. Kim, M. Cha, H. Kim, J. K. Lee and J. Kim (2017) Learning to discover cross-domain relations with generative adversarial networks. In ICML, Cited by: Figure 1, §2, §4.
  11. G. Lample, N. Zeghidour, N. Usunier, A. Bordes and L. DENOYER (2017) Fader networks: manipulating images by sliding attributes. In Advances in Neural Information Processing Systems, Cited by: §2.
  12. J. Liao, R. S. Lima, D. Nehab, H. Hoppe, P. V. Sander and J. Yu (2014-09) Automating image morphing using structural similarity on a halfway domain. ACM Trans. Graph. 33 (5). Cited by: §2.
  13. J. Liu, A. Kanazawa, D. Jacobs and P. Belhumeur (2012) Dog breed classification using part localization. In Proceedings of the 12th European Conference on Computer Vision - Volume Part I, ECCV’12, Berlin, Heidelberg, pp. 172–185. External Links: ISBN 978-3-642-33717-8, Link, Document Cited by: 1st item.
  14. M. Liu, T. Breuel and J. Kautz (2017) Unsupervised image-to-image translation networks. CoRR abs/1703.00848. External Links: Link Cited by: §1, §2, §5.
  15. Z. Liu, P. Luo, X. Wang and X. Tang (2014) Deep learning face attributes in the wild. CoRR abs/1411.7766. External Links: Link, 1411.7766 Cited by: 3rd item.
  16. X. Mao, Q. Li, H. Xie, R. Y. K. Lau and Z. Wang (2017) Least squares generative adversarial networks. In ICCV, Cited by: §3.2.
  17. K. Ni, R. A. Pearce, K. Boakye, B. V. Essen, D. Borth, B. Chen and E. X. Wang (2015) Large-scale deep learning on the YFCC100M dataset. CoRR abs/1502.03409. External Links: Link, 1502.03409 Cited by: 2nd item.
  18. O. Ronneberger, P. Fischer and T. Brox (2015) U-net: convolutional networks for biomedical image segmentation. In MICCAI, Cited by: §2.
  19. Y. Taigman, A. Polyak and L. Wolf (2017) Unsupervised cross-domain image generation. In Proc. of ICLR, Cited by: §1.
  20. T. Wang, M. Liu, J. Zhu, A. Tao, J. Kautz and B. Catanzaro (2018) High-resolution image synthesis and semantic manipulation with conditional gans. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Cited by: §2.
  21. W. Wu, K. Cao, C. Li, C. Qian and C. C. Loy (2019) TransGaGa: geometry-aware unsupervised image-to-image translation. In Proc. of CVPR, Cited by: §1, §2, §5.
  22. Z. Yi, H. Zhang, P. Tan and M. Gong (2017) DualGAN: unsupervised dual learning for image-to-image translation. In Proc. of ICCV, Cited by: §1, §2, §5.
  23. K. Yin, Z. Chen, H. Huang, D. Cohen-Or and H. Zhang (2019) LOGAN: unpaired shape transform in latent overcomplete space. ACM Trans. on Graphics 38 (6), pp. . Cited by: §1.
  24. R. Zhang, P. Isola, A. A. Efros, E. Shechtman and O. Wang (2018-06) The unreasonable effectiveness of deep features as a perceptual metric. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vol. , pp. 586–595. External Links: Document, ISSN Cited by: Figure 3, §4.
  25. J. Zhu, T. Park, P. Isola and A. A. Efros (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In International Conference on Computer Vision (ICCV), to appear, Cited by: Figure 1, §1, §1, §2, §3, 4th item, §4, §5.
  26. J. Zhu, R. Zhang, D. Pathak, T. Darrell, A. A. Efros, O. Wang and E. Shechtman (2017) Toward multimodal image-to-image translation. In Advances in Neural Information Processing Systems, Cited by: §2, §5.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
410074
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description