Learning a Self-inverse Network for Unpaired Bidirectional Image-to-image Translation

Learning a Self-inverse Network for Unpaired Bidirectional Image-to-image Translation

Zengming Shen
University of Illinois at Urbana-Champaign
zshen5@illinois.edu
   S.Kevin Zhou
Medical Imaging, Robotics, Analytic Computing Laboratory & Engineering (MIRACLE)
Institute of Computing Technology, Chinese Academy of Sciences
zhoushaohua@ict.ac.cn
   Yifan Chen
University of Illinois at Urbana-Champaign
yifanc3@illinois.edu
   Bogdan Georgescu
Siemens Healthineers
bogdan.georgescu@siemens-healthineers.com
   Xuqi Liu
Rutgers University
xl325@scarletmail.rutgers.edu
   Thomas S. Huang
University of Illinois at Urbana-Champaign
t-huang1@illinois.edu
Abstract

Recently image-to-image translation has attracted significant interests in the literature, starting from the successful use of the generative adversarial network (GAN), to the introduction of cyclic constraint, to extensions to multiple domains. However, in existing approaches, there is no guarantee that the mapping between two image domains is unique or one-to-one. Here we propose a self-inverse network learning approach for unpaired image-to-image translation. Building on top of CycleGAN, we learn a self-inverse function by simply augmenting the training samples by switching inputs and outputs during training. The outcome of such learning is a proven one-to-one mapping function. Our extensive experiments on a variety of detests, including cross-modal medical image synthesis, object transfiguration, and semantic labeling, consistently demonstrate clear improvement over the CycleGAN method both qualitatively and quantitatively. Especially our proposed method reaches the state-of-the-art result on the label to photo direction of the cityscapes benchmark dataset.

\wacvfinalcopy

1 Introduction

Image-to-image translation (or cross-domain image synthesis) is nothing but a mapping function from an input image to output image or vice versa. Recently image-to-image translation has attracted significant interest from researchers and extensive research works have been proposed, which are easily grouped into two categories: supervised [12] vs unsupervised (or unpaired) [34].

Figure 1: A comparison of our one2one CycleGAN and other methods for image-to-image translation. We define the task as the mapping from domain X to domain Y and the task as the mapping from domain Y to domain X. The and are the two generator networks for the tasks and , respectively. The and the are the associated adversarial discriminators. (a) Pix2pix [12]: Two separate generators networks and for the tasks and , respectively for paired image-to-image translation (b) Cycle GAN [35]: Two jointly trained but mutual inverse generator networks and for the tasks and , respectively for unpaired image-to-image translation (c) One2one CycleGAN: Only one generator network for both tasks for bidirectional unpaired or paired image-to-image translation.

Iosla et al. [12] present the seminar work of image-to-image translation that offers a general-purpose solution, Goodfellow et al. propose to use the generative adversarial network (GAN) [10] for the first time in the literature. While paired data are assumed in  [12], later Zhu et al. [34] propose the CycleGAN approach for addressing the unpaired setting using the so-called cyclic constraints. There are many recent advances that use guidance information [30, 28], impose different constraints [9, 22, 33], or deal with multiple domains[36, 6, 11, 18], etc. In this paper, we study unpaired image-to-image translation.

Specifically, we study the image-to-image translation problem from a perspective of learning a one-to-one mapping between two image domains. It is desirable for many applications. For example, in medical image synthesis, a patient has a unique image for each imaging modality or for each sequence/configuration within a single modality; therefore, having a one-to-one mapping is crucial. Furthermore, we study how to ensure one-to-one mapping under an unpaired setting.

What contrasts with a one-to-one mapping function are one-to-many, many-to-one, and many-to-many [1] 111It is worth noting that recently there are quite some works focusing on addressing image-to-image translation among many domains, also the so-called one-to-many . mapping functions. In [12], the well-studied scenarios of labels-to-scenes, edge-to-photo are more likely one-to-many mapping as it is possible that multiple photos (scenes) have the same edge (label) information. The colorization example is also one-to-many. From an information theory perspective, the entropy of the edge map (label) is low while that of the photo is high. When an image translation goes from an information-gaining direction, that is, from low- to high-entropy, its mapping leans towards one-to-many. Similarly, if it goes from an information-losing direction, then its mapping leans toward many-to-one. If the information level of both domains is close (or information-similar), then the mapping is close to one-to-one. In [12], the examples of Monet-to-photo, summer-to-winter are closer to one-to-one mapping as the underlying contents of both images before and after translation are regarded the same but the styles are different, which does not change the image entropy significantly.

Our main contribution lies in proposing a self-inverse GAN network. When a function is self-inverse, meaning

(1)

it guarantees a one-to-one mapping. We use the CycleGAN [34] as the baseline framework for image-to-image translation. To impose the self-inverse property, we implement a simple idea of augmenting the training samples by switching inputs and outputs during training. However, as we will demonstrate empirically, this seemingly simple idea makes a genuinely big difference!

The distinct feature of our self-inverse network is that it learns one network to perform both forward (: from A to B) and backward (: from B to A) translation tasks. It contrasts with the state-of-the-art approaches which typically learn two separate networks, one for forwarding translation and the other for backward translation. As a result, it enjoys several benefits. First, it halves the necessary parameters, assuming that the self-inverse network and the two separate networks share the same network architecture. Second, it automatically doubles the sample size, a great feature for any data-driven models, thus becoming less likely to over-fit the model.

One key question arises: Is it feasible to learn such a self-inverse network for image-to-image translation? We can not theoretically prove this existence; however, we experimentally demonstrate so. Intuitively, such an existence is related to the redundancy in the expressive power of the deep neural network. Even given a fixed network architecture, the function space for a network that translates an image from to is large enough, that is, there are many neural networks with different parameters capable of doing the same translation job. The same holds for the inversion network. Therefore, the overlap between these two spaces, in which the self-inverse network resides, does exist.

2 Literature review

As mentioned earlier, the approaches for image-to-image translation can be divided into two categories: supervised [12] and unsupervised [34, 21]. The former uses paired images in the training; the latter handles the unpaired one. The generative adversarial network (GAN) is widely used in both types of approaches.

In addition to using the GAN that essentially enforces similarity in in image distribution, other guidance information is used such as landmark points [30], contours [8], sketches [23], anatomical information [28] etc. In addition to cyclic constraint [34], other constraints like ternary discriminative function [9], optimal transport function [22], smoothness over the sample graph [33] are used too.

Also, extensions are proposed to deal with video inputs [31, 3], to synthesize images in high resolution  [32], to seek for diversity [25], to handle more than two image domains [36, 6, 11, 18]. Furthermore, there are methods that leverage attention mechanism  [24, 5, 26] and mask guidance [20]. Finally, disentangling is a new emerging direction [11, 18].

In terms of works about inverse problem with neural networks, [13] makes the CNN architecture invertible by providing an explicit inverse. Ardizzone et.al [2] prove the invertibility theoretically and experimentally in inverse problem using invertible neural networks. More specifically, Kingma [17] shows the benefit of a invertible convolution for the generative flow. Different from previous works, our self-inverse network realize the invertibility of a neural network by switching inputs and outputs..

For image to image translation, many works has been done to diversify the output [1, 21, 18, 11, 36, 19], while not too many work has been done to make the output unique[29]. Our work goes to the latter direction.

Although there are so many research works on image-to-image translation, the perspective of learning a one-to-one mapping network has not been fully investigated, with the exception of [22]. In [22], Lu et al. show that CycleGAN can not theoretically guarantee the one-to-one mapping property and propose to use an optimal transport mechanism to mitigate this issue. However, like GAN, the optimal transport method also measures the similarity in image distribution; hence the one-to-one issue is not fully resolved. By contrast, our self-inverse learning comes with a guarantee that the learned network realizes a one-to-one mapping.

3 Self-inverse learning for unpaired image-to-image translation

In the section, we first show the property that the self-inverse function guarantees one-to-one (one2one) mapping. Then we discuss how to train a self-inverse CycleGAN network for image-to-image translation

3.1 One-to-one property

In image-to-image translation, we define a forward function as that maps an image on domain to another image on domain and, similarly, an inverse function as . When there is no confusion, we will skip the subscript (e.g., ).

Property: If a function is self-inverse, that is , then the function defines a one-to-one mapping, that is, if and only if .

Proof:

[] If , then .

[] If , then as long as the inverse function exists, which is the case for a self-inverse function as .

3.2 One-to-one Benefits

There are several advantages in learning a self-inverse network to have the one-to-one mapping property.

(1) From the perspective of the application, only one self-inverse function can model both tasks and and it is a novel way for multi-task learning. As shown in Figure 1, the self-inverse network generates an output given input, and vice versa, with only one CNN and without knowing the mapping direction. It is capable of doing both tasks within the same network, simultaneously. In comparison to separately assigning two CNNs for tasks and , the self-inverse network halves the necessary parameters, assuming that the self-inverse network and the two CNNs share the same network architecture as shown in Figure 1.

(2) It automatically doubles the sample size, a great feature for any data-driven models, thus becoming less likely to over-fit the model. The self-inverse function has the co-domain . If the sample size of either domain or is , then the sample size for domain is . As a result, the sample size for both tasks and are doubled, becoming a novel method for data augmentation to mitigate the over-fitting problem.

(3) In the unpaired image-to-image translation setting, the goal is to minimize the distribution gap between the two domains. The state-of-art methods can realize this but can not guarantee an ordered mapping or bijection between the two domains. This results in variations for the generated images.

(4)The one-to-one mapping is a strict constraint. Therefore, forcing a CNN model as a self-inverse function can shrink the target function space.

3.3 One-to-one CycleGAN

We are inspired by the basic formulation of CycleGAN [34]. In CycleGAN, there are two generators and , two discriminators and , and one joint object function. In our one2one CycleGAN, we have one shared generator and still two discriminators and . Instead of having a joint objective for the dual-mappings, our proposed method has two separate objective functions, one for each of two mapping directions.

3.3.1 Separate loss functions

Compared to CycleGAN that uses a joint loss for both image transfer directions, our method have two separate losses, one for each image transfer direction. For the mapping function and its discriminator , the adversarial loss is

(2)

The cycle consistency loss is

(3)

For the mapping function and its discriminator , the adversarial loss is:

(4)

The cycle consistency loss is:

(5)

So, the final objective for the mapping function is

(6)

and the minimax optimization solves

(7)

Similarly, the final objective for the mapping function is

(8)

and the minimax optimization solves

(9)

3.4 Self-inverse implementation

We apply the proposed method based on the framework of CycleGAN [34]. To have a fair comparison with CycleGAN, we adopt the architecture of (Johnson et al., 2016) as the generator and the PatchGAN [12] as the discriminator. The log likelihood objective in the original GAN is replaced with a least-squared loss [14] for more stable training. We resize the input images to . The loss weights are set as . Following CycleGAN, we adopt the Adam optimizer  [16] with a learning rate of 0.0002. Similarly, we use a pool size of 50. The learning rate is fixed for the first 100 epochs and linearly decayed to zero over the next 100 epochs on Yosemite and apple2orange datasets. The learning rate is fixed for the first 4 epochs and linearly decayed to zero over the next 3 epochs on the BRATS dataset. The learning rate is fixed for the first 90 epochs and linearly decayed to zero over the next 30 epochs on the Cityscapes dataset.

3.5 Training details and optimization

In our experiments, we use a batch size of 1. At each iteration, we randomly sample a batch of pair , where samples and . At any iteration , we perform the following three steps:

  • Firstly, we feed as the input and as the target, then forward and back-propagate ;

  • Secondly, we feed as the input and as the target, then forward and back-propagate ;

  • Finally, we back-propagate and individually.

4 Experiments

Figure 2: Visual comparison for horsezebra.

In order to test the effect of the proposed method, we evaluate it on an array of applications: cross-modal medical image synthesis, object transfiguration, and style transfer. Also we compare against several unpaired image-to-image translation methods: CycleGAN [34], DiscoGAN [15], DistanceGAN [4], and UNIT [21]. We conduct a user study when the ground truth images are unknown and perform quantitative evaluation when the ground truth images are present.

4.1 Datasets and results

Object transfiguration.

Direction Metric Cycle Distance One2one
horse2zebra Prefer pct. 25% 0 75%
zebra2horse Prefer pct. 23% 0 77%
Table 1: Results of user study on the horse to zebra dataset.

we test our method on the horse zebra task used in CycleGAN paper [34] with 2401 training images (939 horses and 1177 zebras) and 260 test images (120 horses and 140 zebras). This task has no ground truth for generated images and hence no quantitative evaluation is feasible. So we provide the qualitative results obtained in a user study. In the user study, we ask a user to rate his/her preferred image out of three randomly positioned images, one obtains from CycleGAN, one from DistanceGAN, and the other from one2one CycleGAN. Figure 2 shows examples of input and synthesized images and Table 1 summarize the use study results.

Figure 3: Visual comparison for summerwinter on yosemite.

Figure 2 tells that one2one CycleGAN likely generates better quality images in an unsupervised fashion, especially in terms of the quality of zebra synthesis from the horse (refer to the first four rows). Our method generated more real and complete zebra content. From Table 1, it is clear that our one2one CycleGAN is the most favorable with a 75% (77%) preference percentage for the horse2zebra (zebra2horse) mapping direction. and DistanceGAN is the least favorable.

Direction Metric Cycle One2one
summer2winter Prefer pct. 34% 66%
winter2summer Prefer pct. 41% 59%
Table 2: Results of user study on the summer to winter yosemite dataset.

we test our method on the apple orange task [34] with 2014 training images (995 apples and 1019 orange) and 514 test images (248 apples and 266 oranges). This task has no ground truth for generated images and hence no quantitative evaluation is feasible. Figure 4 shows examples of input and synthesized images. There are failure cases in rows 1,2,4 from CycleGAN while our model generates normal images.

Figure 4: Visual comparison for appleorange.
Figure 5: Qualitative comparison for T1T2 on BRATS datasets.

Cross-modal medical image synthesis. This task evaluates cross-modal medical image synthesis. The models are trained on the BRATS dataset [27] which contains paired MRI data to allow quantitative evaluation. It contains ample multi-institutional routine clinically-acquired pre-operative multimodal MRI scans of glioblastoma (GBM/HGG) and lower grade glioma (LGG) images. There are 285 3D volumes for training and 66 3D volume for the test. The and images are selected for our bi-directional image synthesis. All the 3D volumes are preprocessed to one channel image of size 256 x 256 x 1. We use the Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index Measure (SSIM) to evaluate the quality of generated images.

Direction Method PSNR SSIM
T1 T2 CycleGAN 20.79 0.85
T1 T2 One2one CycleGAN 22.03 0.86
T2 T1 CycleGAN 17.47 0.81
T2 T1 One2one CycleGAN 18.31 0.82
Table 3: Evaluation of cross-modal medical image synthesis on the BRATS datase.
Figure 6: Visual comparison for photolabel on the Cityscapes.

As shown in Table 2, on the image synthesis direction, our one2one model outperforms the CycleGAN model on PSNR by 6.0%. The qualitative result is shown in columns 3 and 4 in Figure 6. On the image synthesis direction, our one2one model outperforms the CycleGAN model on PSNR by 5.0%. The qualitative result is shown in columns 7 and 8 in Figure 6.

Label Photo Photo Label
Method Pixel Acc. Class Acc. Class IoU Pixel Acc. Class Acc. Class IoU
CycleGAN 52.7 15.2 11.0 57.2 21.0 15.7
DiscoGAN 45.0 11.1 7.0 45.2 10.9 6.3
DistanceGAN 48.5 10.9 7.3 20.5 8.2 3.4
UNIT 48.5 12.9 7.9 56.0 20.5 14.3
One2one CycleGAN (ours) 58.2 18.9 14.3 52.7 18.1 13.0
Table 4: Results of Photo Label translation on the Cityscapes dataset.

Semantic labeling. We also test our method on the labels photos task using the Cityscapes dataset [7] under the unpaired setting as in the original CycleGAN paper. For quantitative evaluation, in line with previous work, for labels photos we adopt the “FCN score” [12], which evaluates how interpretable the generated photos are according to a semantic segmentation algorithm. For photos labels, we use the standard segmentation metrics, including per-pixel accuracy, per-class accuracy, and mean class Intersection-Over-Union (Class IOU). The quantitative result is shown in Table 3. Our model reaches the state-of-the-art on the label photo direction image synthesis under this unpaired setting. The pixel accuracy outperforms the second best result by 10.4 %; The clas accuracy outperforms the second best result by 24.3 %; The class IoU outperforms the second best result by 30.0 %. On the photo label direction, our model reaches comparable results.

The qualitative result is shown in Figure 5. Compared with CycleGAN which is the second best result in the label photo direction, our model has clearly better visual results. On the photo label direction, our model also have a comparable or better result.

Style Transfer. We also test our method on the summer winter style transfer task using the Yosemite dataset under the unpaired setting as in the original CycleGAN paper. As shown in Figure 3 for the qualitative result, our method has better visual result in both directions of style transfer. We also do a similar user study by providing the generated image from the test set by our model and the CyecleGAN to users. The result is in Table 4. The user study results show that our model has a higher preference than CycleGAN.

5 Conclusions

We have presented an approach for enforcing the learning of a one-to-one mapping function for unpaired image-to-image translation. The idea is to take advantage of representative redundancy in deep networks and realize self-inverse learning. The implementation is as simple as augmenting the training samples by switching inputs and outputs. However, this seemingly simple idea brings a genuinely big difference, which has been confirmed by our extensive experiments on multiple applications including cross-modal medical image synthesis, object transfiguration, style transfer, etc. The proposed one-to-one CycleGAN consistently outperforms the baseline CycleGAN model and other state-of-the-art unsupervised approaches in terms of various qualitative and quantitative metrics. In the future, we plan to investigate the effect of applying the self-inverse learning to natural language translation and study the theoretic perspective of the self-inverse network.

References

  • [1] A. Almahairi, S. Rajeswar, A. Sordoni, P. Bachman, and A. Courville (2018) Augmented cyclegan: learning many-to-many mappings from unpaired data. arXiv preprint arXiv:1802.10151. Cited by: §1, §2.
  • [2] L. Ardizzone, J. Kruse, S. Wirkert, D. Rahner, E. W. Pellegrini, R. S. Klessen, L. Maier-Hein, C. Rother, and U. Köthe (2018) Analyzing inverse problems with invertible neural networks. arXiv preprint arXiv:1808.04730. Cited by: §2.
  • [3] A. Bansal, S. Ma, D. Ramanan, and Y. Sheikh (2018) Recycle-gan: unsupervised video retargeting. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 119–135. Cited by: §2.
  • [4] S. Benaim and L. Wolf (2017) One-sided unsupervised domain mapping. In Advances in neural information processing systems, pp. 752–762. Cited by: §4.
  • [5] X. Chen, C. Xu, X. Yang, and D. Tao (2018) Attention-gan for object transfiguration in wild images. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 164–180. Cited by: §2.
  • [6] Y. Choi, M. Choi, M. Kim, J. Ha, S. Kim, and J. Choo (2018) Stargan: unified generative adversarial networks for multi-domain image-to-image translation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8789–8797. Cited by: §1, §2.
  • [7] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele (2016) The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3213–3223. Cited by: §4.1.
  • [8] T. Dekel, C. Gan, D. Krishnan, C. Liu, and W. T. Freeman (2017) Smart, sparse contours to represent and edit images. arXiv preprint arXiv:1712.08232. Cited by: §2.
  • [9] Z. Gan, L. Chen, W. Wang, Y. Pu, Y. Zhang, H. Liu, C. Li, and L. Carin (2017) Triangle generative adversarial networks. In Advances in Neural Information Processing Systems, pp. 5247–5256. Cited by: §1, §2.
  • [10] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio (2014) Generative adversarial nets. In Advances in neural information processing systems, pp. 2672–2680. Cited by: §1.
  • [11] X. Huang, M. Liu, S. Belongie, and J. Kautz (2018) Multimodal unsupervised image-to-image translation. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 172–189. Cited by: §1, §2, §2.
  • [12] P. Isola, J. Zhu, T. Zhou, and A. A. Efros (2017) Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1125–1134. Cited by: Figure 1, §1, §1, §1, §2, §3.4, §4.1.
  • [13] J. Jacobsen, A. Smeulders, and E. Oyallon (2018) I-revnet: deep invertible networks. arXiv preprint arXiv:1802.07088. Cited by: §2.
  • [14] J. Johnson, A. Alahi, and L. Fei-Fei (2016) Perceptual losses for real-time style transfer and super-resolution. In European conference on computer vision, pp. 694–711. Cited by: §3.4.
  • [15] T. Kim, M. Cha, H. Kim, J. K. Lee, and J. Kim (2017) Learning to discover cross-domain relations with generative adversarial networks. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 1857–1865. Cited by: §4.
  • [16] D. P. Kingma and J. Ba (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §3.4.
  • [17] D. P. Kingma and P. Dhariwal (2018) Glow: generative flow with invertible 1x1 convolutions. In Advances in Neural Information Processing Systems, pp. 10215–10224. Cited by: §2.
  • [18] H. Lee, H. Tseng, J. Huang, M. Singh, and M. Yang (2018) Diverse image-to-image translation via disentangled representations. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 35–51. Cited by: §1, §2, §2.
  • [19] H. Lee, H. Tseng, Q. Mao, J. Huang, Y. Lu, M. Singh, and M. Yang (2019) DRIT++: diverse image-to-image translation via disentangled representations. arXiv preprint arXiv:1905.01270. Cited by: §2.
  • [20] X. Liang, H. Zhang, L. Lin, and E. Xing (2018) Generative semantic manipulation with mask-contrasting gan. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 558–573. Cited by: §2.
  • [21] M. Liu, T. Breuel, and J. Kautz (2017) Unsupervised image-to-image translation networks. In Advances in Neural Information Processing Systems, pp. 700–708. Cited by: §2, §2, §4.
  • [22] G. Lu, Z. Zhou, Y. Song, K. Ren, and Y. Yu (2018) Guiding the one-to-one mapping in cyclegan via optimal transport. arXiv preprint arXiv:1811.06284. Cited by: §1, §2, §2.
  • [23] Y. Lu, S. Wu, Y. Tai, and C. Tang (2018) Image generation from sketch constraint using contextual gan. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 205–220. Cited by: §2.
  • [24] S. Ma, J. Fu, C. Wen Chen, and T. Mei (2018) DA-gan: instance-level image translation by deep attention generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5657–5666. Cited by: §2.
  • [25] Q. Mao, H. Lee, H. Tseng, S. Ma, and M. Yang (2019) Mode seeking generative adversarial networks for diverse image synthesis. arXiv preprint arXiv:1903.05628. Cited by: §2.
  • [26] Y. A. Mejjati, C. Richardt, J. Tompkin, D. Cosker, and K. I. Kim (2018) Unsupervised attention-guided image-to-image translation. In Advances in Neural Information Processing Systems, pp. 3693–3703. Cited by: §2.
  • [27] B. H. Menze, A. Jakab, S. Bauer, J. Kalpathy-Cramer, K. Farahani, J. Kirby, Y. Burren, N. Porz, J. Slotboom, R. Wiest, et al. (2015) The multimodal brain tumor image segmentation benchmark (brats). IEEE transactions on medical imaging 34 (10), pp. 1993–2024. Cited by: §4.1.
  • [28] A. Pumarola, A. Agudo, A. M. Martinez, A. Sanfeliu, and F. Moreno-Noguer (2018) Ganimation: anatomically-aware facial animation from a single image. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 818–833. Cited by: §1, §2.
  • [29] Z. Shen, M. Huang, J. Shi, X. Xue, and T. Huang (2019) Towards instance-level image-to-image translation. arXiv preprint arXiv:1905.01744. Cited by: §2.
  • [30] L. Song, Z. Lu, R. He, Z. Sun, and T. Tan (2018) Geometry guided adversarial facial expression synthesis. In 2018 ACM Multimedia Conference on Multimedia Conference, pp. 627–635. Cited by: §1, §2.
  • [31] T. Wang, M. Liu, J. Zhu, G. Liu, A. Tao, J. Kautz, and B. Catanzaro (2018) Video-to-video synthesis. In Advances in Neural Information Processing Systems (NeurIPS), Cited by: §2.
  • [32] T. Wang, M. Liu, J. Zhu, A. Tao, J. Kautz, and B. Catanzaro (2018) High-resolution image synthesis and semantic manipulation with conditional gans. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8798–8807. Cited by: §2.
  • [33] R. Zhang, T. Pfister, and J. Li (2019) Harmonic unpaired image-to-image translation. CoRR abs/1902.09727. External Links: Link, 1902.09727 Cited by: §1, §2.
  • [34] J. Zhu, T. Park, P. Isola, and A. A. Efros (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision, pp. 2223–2232. Cited by: §1, §1, §1, §2, §2, §3.3, §3.4, §4.1, §4.1, §4.
  • [35] J. Zhu, T. Park, P. Isola, and A. A. Efros (2017) Unpaired image-to-image translation using cycle-consistent adversarial networkss. In Computer Vision (ICCV), 2017 IEEE International Conference on, Cited by: Figure 1.
  • [36] J. Zhu, R. Zhang, D. Pathak, T. Darrell, A. A. Efros, O. Wang, and E. Shechtman (2017) Toward multimodal image-to-image translation. In Advances in Neural Information Processing Systems, pp. 465–476. Cited by: §1, §2, §2.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
389617
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description