Contained Neural Style Transfer for Decorated Logo Generation

Contained Neural Style Transfer for Decorated Logo Generation

Gantugs Atarsaikhan, Brian Kenji Iwana, Seiichi Uchida Graduate School of Information Science and Electrical Engineering
Kyushu University, Fukuoka, Japan
Email: {gantugs.atarsaikhan, brian, uchida}

Making decorated logos requires image editing skills, without sufficient skills, it could be a time-consuming task. While there are many on-line web services to make new logos, they have limited designs and duplicates can be made. We propose using neural style transfer with clip art and text for the creation of new and genuine logos. We introduce a new loss function based on distance transform of the input image, which allows the preservation of the silhouettes of text and objects. The proposed method contains style transfer to only a designated area. We demonstrate the characteristics of proposed method. Finally, we show the results of logo generation with various input images.

neural style transfer, logo generation, convolutional neural network

I Introduction

There is a difficulty of designing logos with decorations. If one does not possess image editing skills, a lot of time and energy will be wasted for making logos. Although, there are on-line tools that can generate logos easily. These websites use a heuristically mutating choice selector, which the user has to select options to generate the logo. An example of this is Logomaster111 However, the problem with those kinds of websites is that the number of designs is limited, and there is a possibility for a duplicate.

In recent years, style transfer using convolutional neural networks has been an active field. There have been an abounding number of works for style transfer between two images to generate new images. Gatys et al. [1] introduced the neural style transfer algorithm. Neural style transfer uses Convolutional Neural Network (CNN) [2] to generate images by synthesizing a content image and a style image. In neural style transfer, local features of the style image transfers onto the structure of a content image. An example of neural style transfer is shown in Fig. 1. Also, ConvDeconv style network is used to transfer styles of image in real time by Johnson et al. [3]. Moreover, style transfer has been achieved by using Generative Adversarial Networks (GAN) by Isola et al. [4]. They successfully used GANs to map input images to newly generated images.

The purpose of this paper is to propose a novel method of generating decorated logos automatically. The term decorated logo refers to a decorated text (word-mark) or a symbol with decorated text. In the proposed method, any pattern images can be used as style image. But, the type of content image is fixed as clip art or binary silhouette like images as shown in examples of Fig. 2. In order to generate logos more clearly, we propose a new loss function in addition to the already existing ones in neural style transfer.

The main contributions are summarized as follows.

  1. The introduction of a new loss function, which is designed for maintaining the shape as possible as it could.

  2. A method to generate new and genuine logos easily.

The remaining of this paper is organized as follows. Section II describes related work regarding in neural style transfer and logo generation. Section III describes neural style transfer algorithm and its mechanism. Section IV explains about the new loss function, which is implemented into neural style transfer algorithm. Section V shows experimental results in logo generation. Finally, we will conclude our work and discuss about future work in Section VI.

(a) Content Image
(b) Style Image
(c) Generated Image
Fig. 1: An example of neural style transfer. Styles such as, textures and local details of the style image 1(b) has been transferred onto the main structure of content image 1(a) , resulting in the generated image 1(c).
(a) Text as a content image and its decorated logo.
(b) Simple clip art as a content image and its decorated logo.
Fig. 2: Examples of content images and a logo which content images will be transformed into.
Fig. 3: The process flow of neural style transfer.

Ii Related Work

One solution for logo generation is the use of genetic algorithms. Logo designing website Mark Maker222 uses a user-selected word to generate first generation of recommended logos. Then depending on selections of the user, the system produces next generation of logos. This process happens until the user finds their desired logo.

There have been few attempts to generate fonts automatically. Tsuchiya et al. [5] used example fonts to determine predictive features. Also, works has been done to generate fonts using interpolation of fonts [6, 7]. Lately, a method to generate fonts using neural style transfer has been proposed [8].

In the field of neural style transfer [1], there have been many methods introduced on transferring styles of art to an image [9]. Li and Wand [10] used Markovian GANs for better texture synthesis. Chen et al. [11] and Ulyanov et al. [12] increased calculation speed of neural style transfer and Gatys et al. [13] preserved features of the content image. Because of that, neural style transfer is used on video [14] as well as sparse images [15]. The main advantage of neural style transfer is simplicity. There is no need of handcrafted features or heuristic rules.

Iii Neural Style Transfer

The basic principle of neural style transfer [1] is to extract content representations and style representations of input images, and mix them into new image using a pre-trained CNN. As a CNN, we used the Visual Geometry Group Network (VGGNet) [16]. The VGGNet was trained for image recognition with ImageNet dataset. Because of its deep neural network design, the VGGnet is suitable for extracting content and style representations of an input image.

Iii-a Content and Style Representations

With a given input image to the VGGNet, filter responses to every layer is produces as feature map. Feature maps on selected layers can be considered as the content representation of an input image. A content representation on lower layers are more similar to the input image, where as a content representation on higher layers loses global features of the input image.

In order to obtain the style representation of an input image, a feature space, which is designed to capture texture information is used. This feature space can be built on any layers of the CNN. It consists of feature correlations given by the Gram matrix in multiple layers. The Gram matrix is given as,


where and refer to feature maps and in layer . The reason to use multiple layers is to obtain a consistent and multi-scale representation of the input image, thereby capturing only its texture information.

Iii-B Neural Style Transfer

Fig. 3 shows the process of transferring the style from a style image onto a content image . First, a content image inputs to the VGGNet and its feature maps in selected layer are stored as the content representation on layer. Next, a style image passes through the network. The sum of Gram matrices on every layer are computed and stored as style representation of a style image.

Then, the image to be generated , which is initialized as the content image, passes through the network. Using its feature maps, the content representation and the style representation of the generated image are computed on same layers as the respective representations.

With content and style representations, loss functions used for generating images can be calculated. The content loss is calculated as a sum of square difference between content representations of the content image and the generated image in the selected layer as shown in Eq. (2).


Style loss can be calculated as a sum of square differences between style representations of the style image and the generated image in every layer.




In Eqs. (3) and (4), are weighting factors for the contribution of layer to style loss, is the number of filters in layer , is the dimensions of layer .

Then, we can combine and into the total loss by taking linear addition. Given and are weighting factors for content and style representations respectively, is defined as,


Lastly, the generated image is gradually optimized to minimize the total loss , thus generating a new image that has contents of the content image and the style of the style image.

Iv Distance transform loss

With neural style transfer, the style from the style image is applied to the entire content image. For the purpose of art, it is desirable to synthesize a whole image with aspects of the content image and style image. However, logos often do not fit in basic primitive shapes. Therefore, when style transferring onto the silhouette image of a logo, the style unnecessarily transfers to the background. Because of that, there is too much noise outside of the desired contents.

However, we cannot simply cut the shape from the generated image. Because part of a shape which is generated by neural style transfer may have cut, causing it to look unnatural. In order to produce more natural looking generated image, we propose a new loss function using distance transform of the input images. The distance transfer loss contains style transfer to the shape of a content image and its near vicinity.

Iv-a Distance Transform

(a) Silhouette
(b) Distance transform
Fig. 4: Original binary image and its distance transform image (visualized by heat map). Inner side of the silhouette shapes are completely white but outer side gradually becomes darker with the increasing distance from the silhouette shape.
(a) Content Image
(b) Style Image
(c) Normal neural style transfer
(d) With distance transform loss
Fig. 5: Comparison between using distance transform loss or not using.

For each pixel in a binary image, the distance transform assigns a value, which is the distance to the nearest pixel that is silhouette. With distance transform, the value of the silhouette pixels become zero, and further the pixels are located from the silhouette higher the values become as shown in Fig. 4.

The distance transform image has same dimensions as an original image, but its pixel values are that of values of distance transform. In this paper, we used Euclidean distance metric for its compatibility to different shapes in order to calculate the distance transform image .

Once the distance transform image is computed, it can be manipulated to emphasize the specific group of pixels. One way is to take pixel-wise power of distance transform image , with power of two or more. The values of pixels that are far from the silhouette shapes become large, whereas, values of nearby pixels do not change much. For every pixel of the distance transform image , emphasis with power would look like,


However, should be two or higher. In other words, we put weights on the pixels that are far from the silhouette shapes, which are more likely to be background pixel.

Iv-B Distance Transform Loss

Given a content image , the generated image , the distance transform image of the content image, and a natural number , the distance transform loss would be,


In other words, first, pixel-wise multiplication of the content image and its distance transform image is calculated and stored. Then, pixel-wise multiplication of generated image and the distance transform image of the content image is calculated. Finally, the distance transform loss is calculated as, squared error between those multiplications.

Equation (7) ensures that the penalty on pixels around the silhouette shapes are always smaller than those of further pixels. While can preserve the shape of the silhouette and surroundings, it can also remove the noises from outside of the silhouettes shapes. This process is emphasized by , larger the the more noise will be removed.

We can implement the distance transform loss to neural style transfer by simply adding to the total loss with weighting factor :


As shown in Fig. 5, the distance transform loss contains style transferring in the shape. It also does not interfere with neural style transfer within the contained area.

(a) Content Image
(b) Style Image
Fig. 6: Different ratio of weighting factors with same . With different ratios, distance transform loss contains it to shapes of silhouette indiscriminatingly.

V Experimental Results

The key contribution of this paper is to contain the neural style transfer to pre-segmented region using distance transform loss. In this section, we will expose some aspects of neural style transfer contained by distance transform loss. Then, we will show some results of logo generation using proposed method.

Content representations were taken from layer ”conv4_2” of VGGNet. Style representations were taken from the layers ”conv1_1”, ”conv2_1”, ”conv3_1”, ”conv4_1”, ”conv5_1”. Weighting factors for content image and style image, and are constant at 0.001 and 1.0 respectively except for Fig. 6.

(a) Content Image
(b) Style Image
Fig. 7: Different values of weighting factor . When is used, neural style transfer is more tightly contained to silhouette shapes.
(a) Content Image
(b) Style Image
Fig. 8: When there is no emphasis for distance transform image as in 8(c), there are much noise around the original shape. With the increase of emphasis power, noise around the shape is removed gradually. (Content image is retrieved from

V-a The Influence to Neural Style Transfer

Fig. 6 shows how the distance transform loss influences neural style transfer. With different values of weighting factors and , different style transfer results can be achieved. When is large, the generated image is more similar to the content image, and when is small the generated image is more similar to the style image.

In Fig. 6, even the changed drastically from emphasizing content image to emphasizing style image, there is no change in shapes of silhouettes. Only inside of shapes are changed according to ratio. Thus, we can assume that although the distance transfer loss contains neural style transfer to silhouette shapes, its influence to style transferring is small. Because, other than background pixels, distance transfer loss is too small to be noticed by optimization algorithm.

V-B The Strength of Weighting Factor

The most important aspects of neural style transfer are the weighting factors. As noted before, different combinations of and produces different results. In this experiment, we will the show impact of with same .

Fig. 7 shows the results with different values of weighting factor . Even with small value as in Fig.7(c), noises are almost completely removed. Then when becomes large enough as in Fig.7(d) and Fig. 7(e), style transfer is completely contained withing the silhouette shapes.

V-C The Emphasis of Distance Transform Image

Fig. 8 shows results with different emphasizing power . When there is no emphasis ( ), there are too much noise formed around the shape. When the the emphasizing power is increased, the noises are removed that much. Therefore, we can assume that larger emphasizing power is much preferable in order to contain style transfer to the silhouette shapes. Because, by taking pixel-wise power of a matrix, large values become much larger and its error will be larger too. Then, the optimizer tries to remove pixels with large error such as background, resulting in noise free style transfer.

(a) Large details
(b) Medium details
(c) Small details

V-D Logo Generation

Fig. LABEL:squares shows that logos generated using dancing human silhouettes and three different style images with different size and density. In neural style transfer, size and density of patterns in style image has big impact on resulting image. When a style image with small details used to synthesize, then the resulting image has small details. Whereas, when a style image with big details used, the resulting image has big details. In addition, density of the style image is transferred to content image too.

For instance in Fig. LABEL:squares, the human shapes consisted of blue squares of style image in each generated image. Nevertheless, the size of those squares are different according to the size of details on the style image. When a style image with small squares is used, the generated image has small squares. On the other hand, when a style image with large squares is used, dancing humans consist of large squares in the generated image.

Fig.10 shows logo generation using a text image as the content image. Also, some logos have both shapes and text combined. In Fig.10, a combination of shapes and text used as the content image for contained neural style transfer.

V-E Transferring Styles to Background

Fig. 11 shows logo generation by style transferring to the background of a content image. Looking closer at the generated image, style of the style image appears inside of the shapes. The reason for that is the shapes in the content image is much thinner and the distance transform values are small. Thus, errors for pixels inside of shape are small and styles transferred onto them. To refrain the style from appearing inside of the shapes, much larger emphasizing power and weighting factor are needed.

V-F Font Generation

Using font images as the content image, novel decorated fonts can be generated. Fig. 12 shows that stylized fonts have successfully generated. Neural style transfer has contained only to the characters due to the distance transform loss. Also, there is no noise around the characters.

For font generation, it is better to use a style rich content image such as Fig. 12(a). Because, the style transfers onto the characters in the content image, and the characters should be large and wide enough to allow style transferring onto them. If it was slim font, then styles have nowhere to transfer but very small region of slim characters.

(d) Content Image
(e) Style Image
(f) Generated Image
(a) Content Image
(b) Style Image
(c) Generated Image
Fig. 9: Logo generation using text image as content image.
Fig. 10: Logo generation using shape and text combinations as content image.
Fig. 9: Logo generation using text image as content image.

Vi Conclusion

In this paper, we introduced distance transform loss function for containing neural style transfer. While cropping the generated image could be used, it does not produce natural results. Because the continuity of a generated shape could be cut. With distance transform loss, neural style transfer occurs inside of a contained region, which is determined dynamically.

The proposed method is suitable for using with pre-segmented content images such as silhouette images or clip arts. We showed that the effect of the distance transform loss is only to contain style transfer to a region. It does not interfere with neural style transfer inside of that contained region.

Also, we observed that how power emphasis and weighting factor work. Those two parameters should be as large as possible, but if it is too large, it could be same as image cropping.

We generated three different types of logos using three different content images: text images containing only text, shapes such as silhouettes or clip arts, and images that contain shapes and text both. In addition to that, we demonstrated logo generation by transferring to the background of a content image. In this case, parameters should be tuned differently to achieve successful results. The biggest advantage of the logos generated by contained neural style transfer, is that generated logos are genuine and novel without any fear of a duplicate.

Finally, we provided examples of font generation with contained neural style transfer. Completely novel and clean font has been generated by out method. In the future research, we are planning to build a complete system for generating logos.

(a) Content Image
(b) Style Image
(c) Generated Image
Fig. 11: Logo generation by transferring styles to background.
(a) Content Image
(b) Bells Style
(c) Generated Image
(d) Flowers Style
(e) Generated Image
Fig. 12: Font generation using contained neural style transfer


This research was partially supported by MEXT-Japan (Grant No.J17H06100).


  • [1] L. A. Gatys, A. S. Ecker, and M. Bethge, “Image style transfer using convolutional neural networks,” in Proc. IEEE Conf. Comput. Vision and Pattern Recognition, 2016, pp. 2414–2423.
  • [2] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proc. IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
  • [3] J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-time style transfer and super-resolution,” in European Conf. Comput. Vision.   Springer, 2016, pp. 694–711.
  • [4] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” arXiv preprint arXiv:1611.07004, 2016.
  • [5] T. Tsuchiya, T. Miyazaki, Y. Sugaya, and S. Omachi, “Automatic generation of kanji fonts from sample designs,” in Tohoku-Section Joint Conv. Inst. Elect. and Inform. Engineers, 2014.
  • [6] N. D. Campbell and J. Kautz, “Learning a manifold of fonts,” ACM Trans. Graphics, vol. 33, no. 4, p. 91, 2014.
  • [7] S. Uchida, Y. Egashira, and K. Sato, “Exploring the world of fonts for discovering the most standard fonts and the missing fonts,” Int. Conf. Document Anal. and Recognition, 2015.
  • [8] G. Atarsaikhan, B. I. Kenji, A. Narusawa, K. Yanai, and S. Uchida, “Neural font style transfer,” in Int. Conf. Document Anal. and Recognition, 2017, pp. 51–56.
  • [9] Y. Jing, Y. Yang, Z. Feng, J. Ye, and M. Song, “Neural style transfer: A review,” arXiv preprint arXiv:1705.04058, 2017.
  • [10] C. Li and M. Wand, “Precomputed real-time texture synthesis with markovian generative adversarial networks,” in European Conf. Comput. Vision.   Springer, 2016, pp. 702–716.
  • [11] T. Q. Chen and M. Schmidt, “Fast patch-based style transfer of arbitrary style,” arXiv preprint arXiv:1612.04337, 2016.
  • [12] D. Ulyanov, V. Lebedev, A. Vedaldi, and V. Lempitsky, “Texture networks: Feed-forward synthesis of textures and stylized images.” in Proc. Int. Conf. Mach. Learning, 2016.
  • [13] L. A. Gatys, M. Bethge, A. Hertzmann, and E. Shechtman, “Preserving color in neural artistic style transfer,” arXiv preprint arXiv:1606.05897, 2016.
  • [14] A. G. Anderson, C. P. Berg, D. P. Mossing, and B. A. Olshausen, “Deepmovie: Using optical flow and deep neural networks to stylize movies,” arXiv preprint arXiv:1605.08153, 2016.
  • [15] A. J. Champandard, “Semantic style transfer and turning two-bit doodles into fine artworks,” arXiv preprint arXiv:1603.01768, 2016.
  • [16] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description