Awesome Typography: Statistics-Based Text Effects Transfer

Awesome Typography: Statistics-Based Text Effects Transfer

Shuai Yang, Jiaying Liu, Zhouhui Lian and Zongming Guo
Institute of Computer Science and Technology, Peking University, Beijing, China
Abstract

In this work, we explore the problem of generating fantastic special-effects for the typography. It is quite challenging due to the model diversities to illustrate varied text effects for different characters. To address this issue, our key idea is to exploit the analytics on the high regularity of the spatial distribution for text effects to guide the synthesis process. Specifically, we characterize the stylized patches by their normalized positions and the optimal scales to depict their style elements. Our method first estimates these two features and derives their correlation statistically. They are then converted into soft constraints for texture transfer to accomplish adaptive multi-scale texture synthesis and to make style element distribution uniform. It allows our algorithm to produce artistic typography that fits for both local texture patterns and the global spatial distribution in the example. Experimental results demonstrate the superiority of our method for various text effects over conventional style transfer methods. In addition, we validate the effectiveness of our algorithm with extensive artistic typography library generation.

1 Introduction

Typography is the technology to design the special text effects to render the character into an original and unique artwork. These amazing text styles include basic effects such as shadows, outlines, colors and sophisticated effects such as burning flames, flowing smokes, multicolored neons, as shown in Fig. 1. Texts decorated by well-designed special effects become much more attractive. It can also better reflect the thoughts and emotions from the designer. The beauty and elegance of text effects are well appreciated, making it widely used in the publishing and advertisement. However, creating vivid text effects requires a series of subtle processes by an experienced designer using the editing software: determine color styles, warp textures to match text shapes and adjust the transparency for visual plausibleness, etc. These advanced editing skills are far beyond the abilities of most casual users. This practical requirement motivates our work: We investigate an approach to automatically transfer various fantastic text effects designed by artists onto raw plain texts, as shown in Fig. 1.

Figure 1: Overview: Our method takes as input the source text image , its counterpart stylized image and the target text image , then automatically generates the target stylized image with the special effects as in .

Text effects transfer is a brand new sub-topic of style transfer. Style transfer can be related to color and texture transfer, respectively. Color transfer matches global [28] or local [32] color distributions of the target and source images. Texture transfer relies on texture synthesis technologies, where the texture generation is constrained by guidance images. Meanwhile, texture synthesis can be divided into two categories: non-parametric methods [8, 7, 16, 36] and parametric methods [15, 10, 18, 12]. The former generates new textures by resampling pixels or patches from the original texture, while the latter models textures using statistical measurements and produces a new texture that shares the same parametric results with the original one.

From a technical perspective, it is quite challenging and impractical to directly exploit the traditional style transfer methods to generate new text effects. The challenges lie in three aspects: (i) The extreme diversity of the text effects and character shapes: The style diversity makes the transfer task difficult to model uniformly. Further, the algorithm should be robust to the tremendous variety of characters. (ii) The complicated composition of style elements: A text effects image often contains multiple intertwined style elements (we call them text sub-effects) that have very different textures and structures (see denim fabric example in Fig. 1) and need specialized treatments. (iii) The simpleness of guidance images: The raw plain text as guidance gives few hints on how to place different sub-effects. Textures in the white text and black background regions may not hold the stationarity. This makes the traditional non-parametric texture-by-numbers method [13] fail, which has assumed textures to be stationary in each region of the guidance map. Meanwhile, the plain text image provides little semantic information. This makes the recent successful parametric deep-based style transfer methods [11, 18] lose their advantages of representing high-level semantic information. For these reasons, conventional style transfer methods for general styles perform poorly on text effects.

In this paper, we propose a novel text effects transfer algorithm to address these challenges. The key idea is to analyze and model the distance-based essential characteristics of high-quality text effects and to leverage them to guide the synthesis process. The characteristics are summarized based on the analytics over dozens of well-designed text effects into a general prior. This prior guides our style transfer process to synthesize different sub-effects adaptively and to simulate their spatial distribution. All measurements are carefully designed to achieve certain robustness to the character shape. In addition, we further consider the psycho-visual factor to enhance image naturalness. In summary, our contributions are threefold:

  • We raise a brand new topic of text effects transfer that turns plain texts into fantastic artworks, which enjoys wide application scenarios such as picture creation on social networks and commercial graphic design.

  • We perform investigation and analysis on well-designed typography and summarize the key distance-based characteristics for high-quality text effects. We model these characteristics mathematically to form a general prior that can be used to significantly improve the style transfer process for texts.

  • We propose the first method to generate compelling text effects, which share both similar local texture patterns and the global spatial distribution with the source example, while preserving image naturalness.

2 Related Work

Color Transfer. Pioneering color transfer methods [28, 26] transfer color between images by matching their global color distributions. Subsequently, local color transfer is achieved based on segmentation [32, 33] or user interaction [35] and it is further improved using fine-grained patch [31] or pixel [30, 25] correspondences. Recently, color transfer [37] and colorization [17, 38] using deep neural networks have drawn people’s attentions.

Non-Parametric Texture Synthesis and Transfer. Efros and Lueng [8] proposed a pioneering pixel-by-pixel synthesis approach based on sampling similar patches. The subsequent works improve it in quality and speed by synthesizing patches rather than pixels. To handle the overlapped regions of neighboring patches for seamlessness, Liang et al[20] proposed to blend patches, and Efros and Freeman [7] used dynamic programming to find an optimal separatrix in overlapped regions, which is further improved via graph cut [16]. Unlike previous methods that synthesize textures in a local manner, recent techniques synthesize globally using objective functions. A coherence-based function [36] is proposed to synthesize textures in an iterative coarse-to-fine fashion. This method performs patch matching and voting operations alternately and achieves good local structures. It is then extended to adapt to non-stationary textures through patch geometric and photometric transformations [2, 6].

Texture transfer, also known as Image Analogies [13], generates textures but also keeps the structure of the target image. Structures are usually preserved by reducing the differences between the source and target guidance maps [13, 24]. In [22], texture boundaries are synthesized in priority to constrain the structure. Frigo et al[9] proposed an adaptive patch partition to precisely capture source textures and preserve target structures, followed by a Markov Random Field (MRF) function for global texture synthesis.

(a) flame
(b) Pattern distribution
(c) denim fabric
(d) Pattern distribution
(e) Partition modes
(f) Pixels in RGB space
(g) Pixel distance distribution
(h) Response curves (distance mode)
(i) Response curves (random mode)
Figure 2: Statistics of the text effects images. (a)(c) flame and denim fabric text effects. (b)(d) Textures with similar distances to the text skeleton (in white) trend to have similar patterns. (e) Pixels are divided into classes using different partition modes. (f)-(g) High correlation between pixel colors and distances: Pixels are distinguished from each other by their distances in RGB space. (h)-(i) High correlation between patch scales and distances: Patches with similar distances have uniform responses to changes of their size.

Parametric Texture Synthesis and Transfer. The idea of modelling textures using statistical measurements has led to the development of textons and its variants [15, 27]. Nowadays, deep-based texture synthesis [10] starts trending due to the great descriptive ability of deep neural networks. Gatys et al. proposed to use Gram-matrix in the Convolutional Neural Networks (CNNs) feature space to represent textures [10] and adapt it to style transfer by incorporating content similarities [11]. This work presented the remarkable generic painting transfer technique and attracted many follow-ups in loss function improvement [21, 29] and algorithm acceleration [14, 34]. Recently, methods that replace the Gram-matrix by MRF regularizer is proposed for photographic synthesis [18] and semantic texture transfer [4]. Meanwhile, Generative Adversarial Networks (GANs) [12] provide another idea for texture generation by using discriminator and generator networks, which iteratively improve the model by playing a minimax game. Its extension, the conditional GANs [23], fulfils the challenging task of generating images from abstract semantic labels. Li and Wand [19] further showed that their Markovian GANs has certain advantages over the Gram-matrix-based methods [11, 34] in coherent texture preservation.

3 Proposed Method

In this section, we first formulate our text effects transfer problem. Visual analytic is then presented on our observation of the high correlation between patch patterns (i.e. color and scale) and their spatial distributions in text effects images (Sec. 3.1). Based on this observation, we extract text effects statistics from the source images (Sec. 3.2) and employ it to adapt the texture synthesis algorithm for high-quality text effects transfer (Sec. 3.3).

3.1 Problem Formulation and Analysis

Text effects transfer takes as input a set of three images, the source raw text image , the source stylized image and the target raw text image , then automatically produces the target stylized image with the text effects such that .

It is a quite challenging task to transfer arbitrary text effects automatically, due to the variety of text effects, the complex composition of sub-effects and the simpleness of guidance maps. To address this problem, we investigate the preferable text effects on the following two aspects: (i) how to determine the essential characteristics of text effects and (ii) how to characterize them mathematically. We start with a basic observation on text effects that the patch patterns are highly dominated by their locations. We develop to represent the pattern of a patch by two optimal factors: the pixel color and the patch scale. As shown intuitively in Figs. 2(a)-(d), the patches at similar locations (marked with the same color) tend to have similar patterns.

To quantitatively evaluate the locations of patches, we divide a text effects image into classes, namely, partitions. The modes of partition are extremely diverse and thus it is impractical to compare all of them. In this work, we compare five typical partition modes: (i) random: all pixels are randomly divided into equal partitions; (ii) grid: all partitions are evenly distributed according to their horizontal and vertical coordinates on the image; (iii) angle: all partitions are evenly distributed according to their angular coordinate, where the center of polar coordinate system is at the geometric center of the image; (iv) ring: all partitions are evenly distributed according to their radial coordinate, where the center of polar coordinate system is at the geometric center of the image; and (v) distance: all partitions are evenly distributed according to their geometric distance (the distance calculation will be given in Sec. 3.2.2) to the skeleton of text on the image. In Fig. 2(e), the partitions modes of grid, angle, ring and distance have been intuitively illustrated, where all partitions are tinged differently.

Then for each partition mode, we investigate the relationship between these partitions and the distributions of corresponding patterns. For the factor of color, we represent its reliability by its classification accuracy of partitions:

(1)

where is the training error or empirical risk obtained by training SVM [5] to classify the color given a type of partition. We have tested on text effects images created by designers to obtain their reliability on color classification. The average reliability are then shown in Table 1, where only the relative values are instructive in our design. From this table, the distance is demonstrated to be the most reliable factor to depict pixel colors, with a value of on average. In Fig. 2(g), pixels of the flame image are tinged according to their distance (see the top left image of Fig. 2(e)) in RGB space. We note that the points with the same class-color appear in the neighborhood. It is also intuitively shown that the color and distance are highly correlated in text effects.

The distance has also shown its importance in characterizing the scale of patterns. Firstly, for different patch sizes, we calculate the average patch difference between all patches in a partition and their best matches on the same image, which forms a response curve of scale. Then, for all the partitions with the same partition mode, we have response curves that show the impacts of scales. Two examples of response curves for denim fabric image are shown in Figs. 2(h) and (i), where each point shows its average and standard deviation of patch differences under the same partition and scale. To compare the reliability of all partition modes, two terms are utilized: (i) inter curve standard deviation : the average of the scale-wise standard deviations of average responses at same partitions; and (ii) intra curve standard deviation : the average of point-wise standard deviations for all scales and partitions. A higher implies that sub-effects are easier to be distinguished by their locations, while a lower implies patches in the same partition react uniformly to scale changing and possibly share common optimal scale for description. Therefore, we evaluate the reliability by

(2)

The reliability of all the five partition modes are then given in Table 1 where the factor of distance achieves highest to characterize the patch scales.

As a conclusion, there exist high correlations between patch patterns (i.e. color and scale) and their distances to text skeletons. These are reasonable essential characteristics for high-quality text effects.

rand gird angle ring dist
color 0.063 0.106 0.119 0.105 0.147
scale 0.153 0.793 0.486 0.590 0.950
Table 1: Reliability between patch patterns and different partitions.

3.2 Text Effects Statistics Estimation

We now convert the aforementioned analysis into patch statistics that can be directly used as the transfer guidance. For our patch-based algorithm, in the following we use and to denote the pixels in and , respectively, and use and to represent the patches centered at in and , respectively. The same goes for patches and in and .

(a) Optimal scale map
(b) Visualized patch scale
Figure 3: Detected optimal patch scales for the flame image.

3.2.1 Optimal Patch Scale Detection

Inspired by [9], we propose a simple yet effective approach to detect the optimal patch scale to depict texture patterns round . Given a predefined downsample factor , we start from the max (roughest) scale to filter source patches and let the screened patches pass to a finer scale.

We use a fixed patch size of and resize the image to accomplish multiple scales. Let be the downsampled source with a scale rate of and be the patch centered at in . and are similarly defined. If is the correspondence of at scale such that

(3)

then our filter criterion at scale is

(4)

where . Patches that satisfy the filter criterion pass through to finer scale , while the filter residues set as their optimal scales. The optimal patch scale detection is summarized in Algorithm 1. An example of the optimal scales for the flame image is shown in Fig. 3(a). It is found that the textured region near the character requires finer patch scales than the outer flat region. For better visualization, we show the optimal scale of the patch by resizing it at a scale rate of in Fig. 3(b).

Input: Image , parameters
Output: Optimal scale for each pixel

1:Initialize and
2:for  do
3:     for all  do
4:         Compute
5:         if  is false then
6:              
7:              
8:         end if
9:     end for
10:end for
Algorithm 1 Optimal Patch Scale Detection

3.2.2 Robust Normalized Distance Estimation

Here we first define some concepts. In the text image, its text region is denoted by . The skeleton is a kernel path within . We use to denote the distance between and its nearest pixel in set . We are going to calculate . For on the text contour , the distance is also known as the text width or radius . Fig. 4(b) gives the visual interpretation.

We extract from using morphology operations. To ensure the distance invariant to the text width, we aim to normalize the distance so that the normalized text width equals to . Simply dividing the distance by the text width is unreliable because the inaccurate of the obtained leads to errors both in the numerator and denominator as well. To address this issue, we estimate corrected text width based on statistics and use the accurate to derive normalized .

Specifically, we sort and obtain their rankings . We observe that the relation between and can be well modelled by linear regression, as shown in Figs. 4(d). From Figs. 4(b)(d), we discover that outliers assemble at small values. We empirically assume the leftmost points are outliers and eliminate them by

(5)

where are linear regression coefficients, is the pixel number of . Finally, the normalized distance is obtained,

(6)

where is the nearest pixel to along and is the mean text width.

For simplicity, we omit and use to refer to in the following.

(a) Text image
(b) Notation definition
(c)
(d) Linear relation of and
Figure 4: Robust normalized distance estimation. (a) The text image. (b) Our detected text skeleton and the notation definition. (c) The estimated normalized distance. The distance of the pixels on the text boundary to the text skeleton are normalized to (colored by magenta). (d) The statistics of the text width.

3.2.3 Optimal Scale Posterior Probability Estimation

In this section, we derive the posterior probability of the optimal patch scale to model the aforementioned high correlation between patch patterns and their spatial distributions.

We uniformly quantify all distances into bins and denote as the bin belongs to. Then, a 2-d histogram is computed:

(7)

where is when the argument is true and otherwise. And the joint probability of the distance and the optimal scale can be estimated as,

(8)

Finally, the posterior probability for being the appropriate scale to depict the patches with distances corresponding to can be deduced:

(9)

We assume the target images share the same posterior probability with the source image. And we will use this probability to select patch scales statistically for texture synthesis to adapt extremely various text effects.

3.3 Text Effect Transfer

In this section, we describe how we adapt conventional texture synthesis method to dealing with the challenging text effects. We build on the texture synthesis method of Wexler et al. [36] and its variants [6] using random search and propagation as in PatchMatch [1, 2]. We refer to these papers for details of the base algorithm.

We apply character shape constrains to the patch appearance measurement to build our baseline, and further incorporate estimated text effects statistics to accomplish adaptive multi-scale style transfer (Sec. 3.3.2). Then a distribution term is introduced to adjust the spatial distribution of the text sub-effects (Sec. 3.3.3). Finally, we propose a psycho-visual term that prevents texture over-repetitiveness for naturalness (Sec. 3.3.4).

3.3.1 Objective Function

We augment the texture synthesis objective function in [36] by including a distribution term and a psycho-visual term. And our objective function takes the following form,

(10)

where is the center position of a target patch in and , is the center position of the corresponding source patch in and . The three terms , and are the appearance, distribution and psycho-visual terms, respectively, which are weighted by and to together make up the patch distance.

3.3.2 Appearance Term: Texture Style Transfer

The original texture synthesis algorithm of Wexler et al. [36] minimizes the Sum of the Squared Differences (SSD) of two patches sampled from texture image pair . We adapt it to texture transfer tasks by applying additional SSD of two patches sampled from the text image pair :

(11)

where is a weight that compromises between the color difference and character shape difference. We take the objective function that only minimizes the appearance term in Eq. (11) as our baseline.

Stylized texts often contain multiple sub-effects with different optimal representation scales. Thus, in addition to the baseline, we propose the adaptive scale-aware patch distance by incorporating the estimated posterior probability,

(12)

The posterior probability helps to explore patches through multiple appropriate scales for better textures synthesis.

3.3.3 Distribution Term: Spatial Style Transfer

The distribution of sub-effects highly correlates with their distances to the text skeleton. Based on this prior, we introduce a distribution term,

(13)

which encourages the text effects of the target to share similar distribution with the source image, thereby realizing a spatial style transfer. To ensure that the cost is invariant to the image scale, we add the denominator .

3.3.4 Psycho-Visual Term: Naturalness Preservation

Texture over-repetitiveness can seriously reduce human subjective evaluation in the aesthetics. Therefore, we aim to penalize certain source patches to be selected repetitiously.

Let be the set of pixels that currently finds as its correspondence and be the size of the set. We define the psycho-visual term as,

(14)

From the perspective of , we can better understand this repetitiveness penalty:

(15)

Since is constant, Eq. (15) reaches the minimum when all equals. It means our psycho-visual term encourages source patches to be used evenly.

3.3.5 Function Optimization

We follow the iterative coarse-to-fine matching and voting steps as in [36]. In the matching step, PatchMatch algorithm [1, 2] is adopted. We update after each iteration of search and propagation for the psycho-visual term. Meanwhile, the initialization of plays an important role in the final results, since our guidance map provides very few constraints on textures. We vote the source patches that are searched to only minimize Eq. (13) to form our initial guess of . This simple strategy improves the final results significantly as shown in Fig. 6.

4 Analysis

(a)
(b)
(c)
Figure 5: Effects of the multi-scale strategy. (a) Results using single-scale patches. (b) Results using single-scale patches. (c) Results using joint patches over scales.
(a)
(b)
(c) Init
Figure 6: Effects of the distribution term. (a) Results without distribution term. (b) Results obtained by random initialization and optimization with distribution term. (c) Results obtained by both initialization and optimization with distribution term.
(a)
(b)
(c)
Figure 7: Effects of the psycho-visual term, which penalizes texture over-repetitiveness and encourages new texture creation.
()
()
() Analogy [13]
() Split & match [9]
() Neural doodle [4]
() Baseline
() Proposed method
Figure 8: Comparison with state-of-the-art methods on various text effects. From top to bottom: neon, smoke, denim fabric. (a) Input source text effects with their raw text counterparts in the lower-left corner. (b) Target text. (c) Results of Image Analogies [13]. (d) Results of Split and Match [9]. (e) Results of Neural Doodles [4]. (f) Results of our baseline method. (g) Results of the proposed method.

Appearance Term. The advantages of the proposed appearance term lie in two aspects: (i) Preserve coarse grained texture structures. (ii) Preserve texture details. We show in Figs. 5(a) and (b) the denim fabric style generated using single-scale and patches, respectively. Small patches capture very limited contextual information, thus it cannot guarantee the structure continuity. As can be seen in Fig. 5(a), sewing threads look cracked and are not along the uniform directions. However, choosing large patches leads to smoothing out tiny thread residues as in Fig. 5(b). These problems are well solved by jointly using patches over scales as in Fig. 5(c), where the overall shape is well preserved and the details like sewing threads look more vivid.

Distribution Term. The distribution term ensures the sub-effects in the target image and the source example are similarly distributed, which is the basis of our assumption in Sec. 3.2.3 that the posterior probabilities in and are the same. Fig. 6 shows the effects of the distribution term on the flame style. Without distribution constraints, the flames appear randomly in the black background. The distribution term adjusts the flames to better match their spatial distribution as that in the source example.

Psycho-Visual Term. The effects of our psycho-visual term are shown in Fig. 7. The lava textures synthesized without the psycho-visual penalty (Fig. 7(a)) densely repeat the red cracks in several regions, which causes obvious unnaturalness. By increasing the penalty, the reuse of the same source textures is greatly restrained (Fig. 7(b)) and our method tends to agilely combine different source patches to create brand-new textures (Fig. 7(c)). Thus, the psycho-visual term can effectively penalize texture over-repetitiveness and encourage new texture creation.

Combination of the Three Term. It is worth noting that the proposed three terms are complementary: First, the appearance and distribution terms emphasize local texture patterns and global text sub-effects distributions, respectively. The former depicts low-level color features while the latter exploits complementary mid-level position features. Second, the appearance and distribution terms jointly evaluate objective patch similarities. Meanwhile, the psycho-visual term complements these two terms by incorporating aesthetic subjective evaluations.

() Target text
() flame
() lava
() rust
() drop
() pop
() blink
Figure 9: Apply different text effects to representative characters (Chinese, alphabetic, handwriting).
Figure 10: An overview of our flame typography library. The bigger image at the top left corner serves as the example to generate the other characters. The whole library as well as the other stylized libraries can be found in the supplementary material.

5 Experimental Results

In the experiment, the patch size is and the max scale . We build an image pyramid of levels with a fixed coarsest size (). At level , joint patches over scales from to are used. The weights , and to balance different terms are set to , and , respectively. The parameter for the filter criterion is . In addition to the examples in this paper, all of our results and comparisons are included in the supplementary material.

In Fig. 8, we present a comparison of our algorithm with three state-of-the-art style transfer techniques as well as our baseline. The first method is the pioneering Image Analogies [13]. The textures in their results repeat locally and look disordered globally with evident patch boundaries. The second method is our implementation of Split and Match [9], which synthesizes textures using adaptive patch sizes. The original method directly transfers the style in to without the help of . To make a fair comparison, we incorporate the guidance by using instead of in the split stage. This method fails to generate textures in the background and produces plain stylized results. The third method, Neural Doodle [4], is based on the combination of MRF and CNN [18] and incorporates semantic maps for analogy guidance. While the color palette of the example text effects is transferred, fine textures are poorly synthesized. The text shape is lost as well. Our baseline transfers fine textures but fails to keep the overall sub-effects distribution and generates artifacts in the background. By comparison, the proposed method outperforms state-of-the-art methods, preserving both local textures and the global sub-effects distribution.

In Fig. 9, we present an illustration of style transfer from six very different text effects to three representative characters (Chinese, alphabetic, handwriting). This experiment covers challenging transformations between styles, languages and fonts. Thanks to distance normalization and multi-scale strategy, our algorithm accomplishes to transfer the text effects regardless of character shapes and texture scales, providing a solid tool for artistic typography.

Finally, we show our flame typography library including as much as frequently used Chinese characters. Due to the space limitation, only the first of them are presented in Fig. 10. The whole library as well as the other typography libraries are included in our supplementary material. The extensive synthesis results demonstrate the robustness of our method to varied character shapes.

6 Conclusion

In this paper, we raise the text effects transfer problem and propose a novel statistics-based method to solve it. We convert the high correlation between the sub-effects patterns and their relative spatial distribution to the text skeletons into soft constraints for text effects generation. An objective function with three complementary terms is proposed to jointly consider the local multi-scale texture, global sub-effects distribution and visual naturalness. We validate the effectiveness and robustness of our method by comparisons with state-of-the-art style transfer algorithms and extensive artistic typography generations. Future work will concentrate on the composition of the stylized texts and the background photos.

References

  • [1] C. Barnes, E. Shechtman, A. Finkelstein, and D. B. Goldman. Patchmatch: a randomized correspondence algorithm for structural image editing. ACM Transactions on Graphics, 28(3):341–352, August 2009.
  • [2] C. Barnes, E. Shechtman, D. B. Goldman, and A. Finkelstein. The generalized patchmatch correspondence algorithm. In Proc. European Conf. Computer Vision, pages 29–43, 2010.
  • [3] C. Barnes, F.-L. Zhang, L. Lou, X. Wu, and S.-M. Hu. Patchtable: Efficient patch queries for large datasets and applications. In ACM Transactions on Graphics, 2015.
  • [4] A. J. Champandard. Semantic style transfer and turning two-bit doodles into fine artworks. 2016. Arvix preprint; https://arxiv.org/abs/1603.01768.
  • [5] C. C. Chang and C. J. Lin. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1–27:27, 2011. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
  • [6] S. Darabi, E. Shechtman, C. Barnes, D. B. Goldman, and P. Sen. Image melding: combining inconsistent images using patch-based synthesis. ACM Transactions on Graphics, 31(4):82:1–82:10, July 2012.
  • [7] A. A. Efros and W. T. Freeman. Image quilting for texture synthesis and transfer. In Proc. ACM Conf. Computer Graphics and Interactive Techniques, pages 341–346, 2001.
  • [8] A. A. Efros and T. K. Leung. Texture synthesis by non-parametric sampling. In Proc. IEEE Int’l Conf. Computer Vision, 1999.
  • [9] O. Frigo, N. Sabater, J. Delon, and P. Hellier. Split and match: example-based adaptive patch sampling for unsupervised style transfer. In Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, 2016.
  • [10] L. A. Gatys, A. S. Ecker, and M. Bethge. Texture synthesis using convolutional neural networks. In Advances in Neural Information Processing Systems, 2015.
  • [11] L. A. Gatys, A. S. Ecker, and M. Bethge. Image style transfer using convolutional neural networks. In Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, 2016.
  • [12] I. Goodfellow, J. Pougetabadie, M. Mirza, B. Xu, D. Wardefarley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In Advances in Neural Information Processing Systems, pages 2672–2680, 2014.
  • [13] A. Hertzmann, C. E. Jacobs, N. Oliver, B. Curless, and D. H. Salesin. Image analogies. In Proc. Conf. Computer Graphics and Interactive Techniques, pages 327–340, 2001.
  • [14] J. Johnson, A. Alahi, and F. F. Li. Perceptual losses for real-time style transfer and super-resolution. In Proc. European Conf. Computer Vision, 2016.
  • [15] B. Julesz and J. R. Bergen. Textons, the fundamental elements in preattentive vision and perception of textures. Bell Labs Technical Journal, 62(6):243–256, 1983.
  • [16] V. Kwatra, A. Schödl, I. Essa, G. Turk, and A. Bobick. Graphcut textures: image and video synthesis using graph cuts. ACM Transactions on Graphics, 22(3):277–286, 2003.
  • [17] G. Larsson, M. Maire, and G. Shakhnarovich. Learning representations for automatic colorization. In Proc. European Conf. Computer Vision, 2016.
  • [18] C. Li and M. Wand. Combining markov random fields and convolutional neural networks for image synthesis. In Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, 2016.
  • [19] C. Li and M. Wand. Precomputed real-time texture synthesis with markovian generative adversarial networks. In Proc. European Conf. Computer Vision, 2016.
  • [20] L. Liang, C. Liu, Y. Xu, B. Guo, and H. Shum. Real-time texture synthesis by patch-based sampling. ACM Transactions on Graphics, 20(3):127–150, 2001.
  • [21] T. Lin and S. Maji. Visualizing and understanding deep texture representations. In Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, 2016.
  • [22] M. Lukáč, J. Fišer, J. C. Bazin, O. Jamriška, A. Sorkine-Hornung, and D. Sýkora. Painting by feature: texture boundaries for example-based image creation. ACM Transactions on Graphics, 32(4):96–96, 2013.
  • [23] M. Mirza and S. Osindero. Conditional generative adversarial nets. Computer Science, pages 2672–2680, 2014.
  • [24] F. Okura, K. Vanhoey, A. Bousseau, A. A. Efros, and G. Drettakis. Unifying color and texture transfer for predictive appearance manipulation. Computer Graphics Forum, 34(4):53–63, 2015.
  • [25] J. Park, Y. Tai, S. N. Sinha, and I. S. Kweon. Efficient and robust color consistency for community photo collections. In Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, 2016.
  • [26] F. Pitié, A. C. Kokaram, and R. Dahyot. Automated colour grading using colour distribution transfer. Computer Vision and Image Understanding, 107(1):123–137, 2007.
  • [27] J. Portilla and E. P. Simoncelli. A parametric texture model based on joint statistics of complex wavelet coefficients. Int’l Journal of Computer Vision, 40(1):49–70, 2000.
  • [28] E. Reinhard, M. Ashikhmin, B. Gooch, and P. Shirley. Color transfer between images. IEEE Computer Graphics and Applications, 21(5):34–41, 2001.
  • [29] A. Selim, M. Elgharib, and L. Doyle. Painting style transfer for head portraits using convolutional neural networks. ACM Transactions on Graphics, 35(4):1–18, 2016.
  • [30] Y. Shih, S. Paris, C. Barnes, W. T. Freeman, and F. Durand. Style transfer for headshot portraits. ACM Transactions on Graphics, 33(4):1–14, 2014.
  • [31] Y. Shih, S. Paris, F. Durand, and W. T. Freeman. Data-driven hallucination of different times of day from a single outdoor photo. ACM Transactions on Graphics, 32(6):2504–2507, 2013.
  • [32] Y. W. Tai, J. Jia, and C. K. Tang. Local color transfer via probabilistic segmentation by expectation-maximization. In Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, pages 747–754, 2005.
  • [33] Y. W. Tai, J. Jia, and C. K. Tang. Soft color segmentation and its applications. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(9):1520–1537, 2007.
  • [34] D. Ulyanov, V. Lebedev, A. Vedaldi, and V. Lempitsky. Texture networks: feed-forward synthesis of textures and stylized images. In Proc. IEEE Int’l Conf. Machine Learning, 2016.
  • [35] T. Welsh, M. Ashikhmin, and K. Mueller. Transferring color to greyscale images. ACM Transactions on Graphics, 21(3):277–280, 2002.
  • [36] Y. Wexler, E. Shechtman, and M. Irani. Space-time completion of video. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(3):463–476, March 2007.
  • [37] Z. Yan, H. Zhang, B. Wang, S. Paris, and Y. Yu. Automatic photo adjustment using deep neural networks. ACM Transactions on Graphics, 35(1), 2016.
  • [38] R. Zhang, P. Isola, and A. A. Efros. Colorful image colorization. In Proc. European Conf. Computer Vision, 2016.

As supplementary material of our paper, we present the following contents:

  • Correlations between patch patterns and different modes in text effects images. (Table 2)

  • Comparison of our text effects transfer approach with state-of-the-art methods. (Figs. 314)

  • Illustration of style transfer from various text effects to different characters. (Figs. 720)

  • An overview of our four artistic typography libraries. (Fig. 21)

Appendix A Correlations Between Patch Patterns and Different Modes

Images111Image credits:
01: http://www.zcool.com.cn/work/ZMTg1MTgzMDA=.html   02: http://www.zcool.com.cn/work/ZMTc4MTM5MDQ=.html
03: http://www.zcool.com.cn/work/ZMTc1MzI4MzI=.html   04: http://www.zcool.com.cn/work/ZMTc0NDIxMDg=.html
05: http://www.zcool.com.cn/work/ZMTcwNjEwMTI=.html   06: http://www.zcool.com.cn/work/ZNTE3MTAxMg==.html
07: http://www.zcool.com.cn/work/ZMTU0Mzk3ODg=.html   08: http://www.zcool.com.cn/work/ZMTUxMDc0MDQ=.html
09: http://www.zcool.com.cn/work/ZMTQ0MjY4OTI=.html   10: http://www.zcool.com.cn/work/ZMTQ0MjU3NDA=.html
11: http://www.zcool.com.cn/work/ZNjExMzA0NA==.html   12: http://www.zcool.com.cn/work/ZNjg1MTg1Ng==.html
13: http://www.zcool.com.cn/work/ZMTg5NjM3MzI=/1.html
14: http://www.zcool.com.cn/work/ZMTg5NTg2MDQ=/2.html
15: http://www.zcool.com.cn/work/ZMTg5NDk0NDg=.html   16: http://www.zcool.com.cn/work/ZMTMwMzM5NzI=.html
17: http://www.68ps.com/jc/big_ps_wz.asp?id=3773          18: http://www.zcool.com.cn/work/ZMTI5MDczMjQ=.html
19: http://www.zcool.com.cn/work/ZNzUzNTkwMA==/2.html
20: http://www.zcool.com.cn/work/ZNzUzNTkwMA==/3.html
21: http://www.zcool.com.cn/work/ZNTE3MTAxMg==.html   22: http://www.zcool.com.cn/work/ZNTEwNDA5Mg==.html
23: http://www.zcool.com.cn/work/ZNDEyMzc0NA==.html   24: http://www.zcool.com.cn/work/ZNzM3NDA5Ng==.html
25: http://www.zcool.com.cn/work/ZNDIzMDMzMg==.html   26: http://www.zcool.com.cn/work/ZMzM3NjQ4NA==.html
27: http://www.zcool.com.cn/work/ZMzM3NjQ4NA==.html
28: http://www.chinaz.com/design/2015/0604/412039.shtml
29: http://www.68ps.com/jc/big_ps_wz.asp?id=3916          30: http://www.68ps.com/jc/big_ps_wz.asp?id=3854
color scale
rand gird angle ring dist rand gird angle ring dist
01 0.061 0.115 0.120 0.105 0.167 0.134 0.833 0.462 0.571 0.394
02 0.061 0.088 0.100 0.088 0.139 0.178 1.219 0.241 1.488 1.474
03 0.064 0.104 0.139 0.090 0.125 0.131 0.872 0.496 0.650 1.208
04 0.067 0.140 0.109 0.133 0.197 0.175 0.418 0.327 0.567 0.606
05 0.059 0.123 0.115 0.125 0.152 0.158 0.471 0.232 0.593 0.671
06 0.059 0.095 0.101 0.088 0.133 0.193 1.053 0.721 0.406 1.021
07 0.063 0.126 0.115 0.130 0.147 0.148 0.434 0.222 0.463 1.088
08 0.068 0.116 0.125 0.111 0.134 0.144 1.109 0.492 1.014 1.113
09 0.062 0.106 0.100 0.111 0.144 0.154 0.916 0.426 0.804 1.304
10 0.063 0.114 0.109 0.109 0.137 0.160 0.536 0.291 0.671 1.027
11 0.064 0.116 0.115 0.121 0.136 0.163 0.385 0.267 0.539 0.817
12 0.064 0.123 0.111 0.115 0.168 0.171 0.584 0.332 0.725 0.867
13 0.063 0.096 0.105 0.122 0.131 0.113 1.099 0.304 0.972 0.766
14 0.062 0.097 0.112 0.107 0.147 0.164 0.765 0.518 0.479 0.672
15 0.061 0.099 0.118 0.102 0.144 0.162 0.851 0.403 0.797 0.547
16 0.065 0.104 0.129 0.104 0.193 0.110 1.143 0.630 0.551 1.568
17 0.063 0.086 0.125 0.095 0.137 0.175 1.078 0.868 0.488 1.303
18 0.065 0.123 0.127 0.117 0.132 0.145 0.546 0.282 0.514 0.505
19 0.061 0.095 0.126 0.091 0.142 0.165 0.644 0.772 0.255 0.900
20 0.067 0.112 0.143 0.120 0.142 0.156 0.944 0.362 0.572 0.904
21 0.063 0.093 0.111 0.087 0.129 0.163 0.978 0.745 0.654 0.427
22 0.062 0.094 0.128 0.100 0.141 0.110 1.245 0.885 0.466 1.658
23 0.065 0.120 0.138 0.109 0.145 0.169 0.377 0.377 0.135 0.211
24 0.064 0.097 0.115 0.095 0.216 0.151 0.655 0.480 0.458 0.783
25 0.064 0.105 0.140 0.109 0.147 0.130 0.717 0.671 0.455 0.661
26 0.066 0.107 0.116 0.112 0.141 0.128 1.076 0.678 0.769 1.465
27 0.060 0.118 0.143 0.110 0.185 0.132 1.144 0.752 0.475 1.235
28 0.058 0.111 0.128 0.096 0.128 0.175 0.934 0.683 0.491 1.369
29 0.061 0.087 0.108 0.082 0.136 0.124 0.290 0.306 0.330 1.165
30 0.063 0.077 0.111 0.077 0.107 0.205 0.464 0.353 0.354 0.783
Average 0.063 0.106 0.119 0.105 0.147 0.153 0.793 0.486 0.590 0.950
Table 2: Correlations between patch patterns and different modes.

Appendix B Comparisons with State-of-the-Art Methods

(a) Source raw text
(b) Source text effects
(c) Target raw text
(d) Image Analogies [13]
(e) Split and Match [9]
(f) Neural Doodles [4]
(g) PatchTable[3]
(h) Baseline
(i) Proposed method
Figure 11: Comparison with state-of-the-art methods on the text effects of smoke222Image credits: Created by us under the design tutorial of  http://photo.renren.com/photo/249458089/photo-3396512368/v7. (a) Input source raw text. (b) Input source text effects. (c) Target text. (d) Results of Image Analogies [13]. (e) Results of Split and Match [9]. (f) Results of Neural Doodles [4]. (g) Results of PatchTable333PatchTable [3] does not transfer colors. Thus, it produces black and white results.  [3] (h) Results of our baseline method. (i) Results of the proposed method.
(a) Source raw text
(b) Source text effects
(c) Target raw text
(d) Image Analogies [13]
(e) Split and Match [9]
(f) Neural Doodles [4]
(g) PatchTable [3]
(h) Baseline
(i) Proposed method
Figure 12: Comparison with state-of-the-art methods on the text effects of flame444Image credits: http://www.phombo.com/wallpapers/fire-letters-wallpapers-hd-3000x3000-a-z0-9/page-1/. (a) Input source raw text. (b) Input source text effects. (c) Target text. (d) Results of Image Analogies [13]. (e) Results of Split and Match [9]. (f) Results of Neural Doodles [4]. (g) Results of PatchTable [3] (h) Results of our baseline method. (i) Results of the proposed method.
(a) Source raw text
(b) Source text effects
(c) Target raw text
(d) Image Analogies [13]
(e) Split and Match [9]
(f) Neural Doodles [4]
(g) PatchTable [3]
(h) Baseline
(i) Proposed method
Figure 13: Comparison with state-of-the-art methods on the text effects of denim fabric555Image credits: Created by us under the design tutorial of  http://www.zcool.com.cn/article/ZMTQ1NDA0.html. (a) Input source raw text. (b) Input source text effects. (c) Target text. (d) Results of Image Analogies [13]. (e) Results of Split and Match [9]. (f) Results of Neural Doodles [4]. (g) Results of PatchTable [3] (h) Results of our baseline method. (i) Results of the proposed method.
(a) Source raw text
(b) Source text effects
(c) Target raw text
(d) Image Analogies [13]
(e) Split and Match [9]
(f) Neural Doodles [4]
(g) PatchTable [3]
(h) Baseline
(i) Proposed method
Figure 14: Comparison with state-of-the-art methods on the text effects of neon666Image credits: Created by us under the design tutorial of  http://www.zcool.com.cn/u/1001696. (a) Input source raw text. (b) Input source text effects. (c) Target text. (d) Results of Image Analogies [13]. (e) Results of Split and Match [9]. (f) Results of Neural Doodles [4]. (g) Results of PatchTable [3] (h) Results of our baseline method. (i) Results of the proposed method.

Appendix C Text Effects Transfer between Different Styles, Languages and Fonts

(a) water
(b) flame
(c) rust
(d) blink
(e) drop
(f) silver
Figure 15: Six source text effects images777Image credits:
water: http://www.zcool.com.cn/work/ZNTQxOTkzMg==.html
flame: http://www.phombo.com/wallpapers/fire-letters-wallpapers-hd-3000x3000-a-z0-9/page-1/
rust: Created by us under the design tutorial of http://photo.renren.com/photo/249458089/photo-3396512370/v7
blink: Created by us under the design tutorial of http://www.zcool.com.cn/article/ZNTYxNTY=.html
drop: Created by us under the design tutorial of http://www.zcool.com.cn/work/ZNDk1MzQxMg==.html
silver Created by us under the design tutorial of http://www.zcool.com.cn/work/ZMTc2NzU0OA==.html
 for experiments. Text effects: water, flame, rust, blink, drop, silver.
(a) Chinese character Chen
(b) Chinese character He
(c) Alphabetic character Q
(d) Chinese character Yuan
(e) Handwriting character Zi
(f) Alphabetic character R
(g) Chinese character Ai
(h) Handwriting character Qi
Figure 16: Eight target representative characters (Chinese, alphabetic, handwriting) for experiments. Target texts: Chen, He, Q, Yuan, Zi, R, Ai, Qi.
Figure 17: Results of our method with flame, rust and drop as examples. Top row: three source text effects images. Left column: four target characters.
Figure 18: Results of our method with flame, rust and drop as examples. Top row: three source text effects images. Left column: four target characters.
Figure 19: Results of our method with water, blink and silver as examples. Top row: three source text effects images. Left column: four target characters.
Figure 20: Results of our method with water, blink and silver as examples. Top row: three source text effects images. Left column: four target characters.

Appendix D Artistic Typography Library

(a) Overview of our flame typography library with alphabetic characters and Arabic numerals
(b) Overview of our smoke typography library with alphabetic and Arabic numerals characters
(c) Overview of our flame typography library with frequently used Chinese characters
(d) Overview of our neon typography library with frequently used Chinese characters
Figure 21: An overview of our flame, smoke and neon typography libraries. The bigger image at the top left corner serves as the example to generate the other characters. Due to the space limitation, only the first characters of each library are presented.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
36627
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description