Cartoon-to-real: An Approach to Translate Cartoon to Realistic Images using GAN
We propose a method to translate cartoon images to real world images using Generative Aderserial Network (GAN). Existing GAN-based image-to-image translation methods which are trained on paired datasets are impractical as the data is difficult to accumulate. Therefore, in this paper we exploit the Cycle-Consistent Adversarial Networks (CycleGAN) method for images translation which needs an unpaired dataset. By applying CycleGAN we show that our model is able to generate meaningful real world images from cartoon images. However, we implement another state of the art technique – Deep Analogy – to compare the performance of our approach.
What if we could see real images of one of the most famous cartoon movies – Spirited Away (2001)? How would it feel to see the protagonist, chihiro’s real life version? Isn’t this most of the caroon-lovers have dreamt of while watching cartoon movies?
In this paper, we present a method – Cartoon-to-Real – to materialize the above desire by performing cartoon to real world image translation. However, it is extremely time consuming and tedious to create a sufficient paired dataset, hence we develope an unpaired one. We extracted cartoon images from different cartoon movies and real images from internet, i.e. flickr, where a cartoon and a realistic image in a pair have no correlations among themselves. Using our Cartoon-to-Real, we achieve significant result in translating the cartoon images to realistic ones.
2 Background and Related Works
Recently, Generative Adversarial Network (GANs)  have achieved astounding results in image synthesis such as – text-to-image translation, image inpainting, super-resolution etc. Moreover, GAN is widely used in image-to-image translation, for example – CycleGAN – that uses unpaired training data. It trains two sets of GAN to map class R class C and C R respectively. Recently, CartoonGAN is proposed to translate real word images to cartoon images which converges faster than CycleGAN and performs satisfactorily (see Figure. 1).
3 Proposed Methodology
The main target of our cartoon-to-real is to perform the reverse of CartoonGAN (CartoonReal) and we exploit the CycleGAN technique for this purpose. The model contains two mapping functions and where denotes cartoon and denotes the real domain. There are discriminators (, ) and generators (, ) for the translation process. While performing , tries to enforce the translation to domain , and the vice-versa for and . For the regularization, we implement two cycle consistency losses where authors proposed that the learned mapping functions should be cycle-consistent to avoid direct mapping distribution. The loss is written as -
Hence, the full objective is -
where is the weight or relative importance of the two objectives. Therefore, our aim can be described as -
4 Experiment Results
We developed two unpaired datasets to train our network. For the cartoon domain, we collected almost K images scrapped from various movies, e.g. Pokemon, My Neighbour Totoro and Kiki’s Delivery. We used flickr dataset for the real images’ domain. Images are resized to resolution. For implementation we used PyTorch and for hardware we used Nvidia GTX . We compared our paper with the outputs from another state of the art work called – Deep Analogy. Our result along with the outputs from Deep Analogy are presented in Figure 2. It exhibits that outputs with cycle consistency loss produce more realistic images than that with Deep Analogy.
In this paper, we performed image translation from cartoons to real world images. We used cycle consistency loss to generate images so that the generate image will not be directly mapped into any distribution of target domain. Our research is yet on progress. However, we observed that our results are not completely satisfactory and our upcoming target is to minimize the limitation. In future, we want to investigate on preserving the content of input image from cartoon domain for better translation.
-  I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in neural information processing systems, 2014, pp. 2672–2680.
-  S. E. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, and H. Lee, “Generative adversarial text to image synthesis,” CoRR, vol. abs/1605.05396, 2016. [Online]. Available: http://arxiv.org/abs/1605.05396
-  R. A. Yeh, C. Chen, T. Lim, M. Hasegawa-Johnson, and M. N. Do, “Semantic image inpainting with perceptual and contextual losses,” CoRR, vol. abs/1607.07539, 2016. [Online]. Available: http://arxiv.org/abs/1607.07539
-  C. Ledig, L. Theis, F. Huszar, J. Caballero, A. P. Aitken, A. Tejani, J. Totz, Z. Wang, and W. Shi, “Photo-realistic single image super-resolution using a generative adversarial network,” CoRR, vol. abs/1609.04802, 2016. [Online]. Available: http://arxiv.org/abs/1609.04802
-  J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” arXiv preprint, 2017.
-  Y. Chen, Y.-K. Lai, and Y.-J. Liu, “Cartoongan: Generative adversarial networks for photo cartoonization,” 2018.
-  J. Liao, Y. Yao, L. Yuan, G. Hua, and S. B. Kang, “Visual attribute transfer through deep image analogy,” arXiv preprint arXiv:1705.01088, 2017.