Effective Clipart Image Vectorization Through Direct Optimization of Bezigons

Effective Clipart Image Vectorization Through Direct Optimization of Bezigons


Bezigons, i.e., closed paths composed of Bézier curves, have been widely employed to describe shapes in image vectorization results. However, most existing vectorization techniques infer the bezigons by simply approximating an intermediate vector representation (such as polygons). Consequently, the resultant bezigons are sometimes imperfect due to accumulated errors, fitting ambiguities, and a lack of curve priors, especially for low-resolution images. In this paper, we describe a novel method for vectorizing clipart images. In contrast to previous methods, we directly optimize the bezigons rather than using other intermediate representations; therefore, the resultant bezigons are not only of higher fidelity compared with the original raster image but also more reasonable because they were traced by a proficient expert. To enable such optimization, we have overcome several challenges and have devised a differentiable data energy as well as several curve-based prior terms. To improve the efficiency of the optimization, we also take advantage of the local control property of bezigons and adopt an overlapped piecewise optimization strategy. The experimental results show that our method outperforms both the current state-of-the-art method and commonly used commercial software in terms of bezigon quality.


lipart vectorization, clipart tracing, bezigon optimization


1 Introduction


Image vectorization, also known as image tracing, is the process of converting a bitmap image into a vector image. There are various types of vectorization. In the present work, we focus on clipart image vectorization. In such a case, the input raster is a clipart image, which is generally composed exclusively of digital illustrations like cartoons, logos, and symbols. Notably, this kind of images do not include photographs or scans of real hand-made drawings.

There is a huge demand for such a conversion technique. According to a survey from [1], more than 7 million man hours are spent on vectorizing images in the United States every year, and approximately 60% of the more than 10 million images to be vectorized are clipart images such as logos and other rasterized vector art. As further evidence of the large demand for clipart image vectorization, there is also a large market for online services that specialize in tracing clipart images. The conversion can be manually performed, but this may require a substantial amount of time and effort, particularly for those users who are not proficient in tracing images. This situation provides strong motivation for the development of an automated algorithm for precise vectorization.

Notably, most modern methods that are appropriate for vectorizing clipart images [2, 3, 1, 4] use bezigons to represent the resultant vector contours, which has become the standard because of the compactness and editability of bezigons.

However, almost no existing methods are specialized for directly obtaining bezigons. Such methods typically direct most of their effort toward the generation of intermediate polygons (Figure 1b) and consequently estimate bezigons (Figure 1c) that reproduce these polygons rather than the original image [2, 1, 4]. Among these methods, [1] (also known as Vector Magic [5]) generally produces the most accurate bezigon boundaries. 1 The key to Vector Magic’s success is that an effective and differentiable polygon-based rasterization function was found, allowing polygon parameters to be precisely optimized based on this function and polygon-specific priors. Nevertheless, even this state-of-the-art method may still result in low-quality vectorized bezigons (Figure 1c), and other existing methods are much more susceptible to such problems. There are three reasons for this issue. First, errors introduced in the polygon estimation stage cannot be effectively corrected in the curve-fitting stage without observation of the raster input. Second, even if the estimated polygons are perfect, ambiguities still exist in the curve-fitting stage because of the nature of data approximation. Third, bezigon-based priors have not yet been fully developed. In short, generating bezigons in such an indirect manner may have a substantial negative effect on the accuracy of the bezigon boundaries. This poses a serious problem for clipart vectorization because even a slightly improper or irrational boundary can be identified as a significant artifact in a clipart image.

Figure 1: Traditional pipeline of clipart vectorization. The dotted curves represent ground-truth outlines. (a) Raster input. (b) Intermediate representation (green polygons). (c) Final vector result (green bezigons).

To solve the problems summarized above while retaining the advantages of the state-of-the-art method [1], an intuitive approach is to devise an effective optimization mechanism that is specific to bezigons.

Figure 2: Continuity of various candidate data energy functions. (a) Variation of a bezigon with the y coordinate of its control point . (b) Variation of the data energy with the y coordinate of under various rasterization functions.

However, establishing such a framework is non-trivial. In general, a direct optimization of bezigons would necessitate an appropriate rasterization function specialized for bezigons because such a function defines the bezigons’ fidelity to the raster image and serves as a fundamental basis for the entire bezigon-specific optimization mechanism. However, most available rasterization functions are not suited to this purpose because commonly used bezigon rasterization methods are typically based on sampling sub-pixel locations of the pixel grid 2; the functions used in these methods are non-differentiable, contain many discontinuities, and are piecewise flat (have zero gradient with respect to the bezigon parameters) almost everywhere (as shown in Figure 2). These properties impose a serious limitation on the effectiveness and efficiency of the optimization procedure. Consequently, searching for a suitable bezigon-specific rasterization function is the first challenge and the foremost problem that must be overcome.

Even if this first challenge is overcome, the solution space might remain large and contain many unreasonable bezigons that give rise to nearly the same raster image (Figure 3 illustrates examples of such illegal cases). We observe that reasonable bezigons, when serving as vector primitives, occupy only a small fraction of the parameter space of general bezigons. There should be specific prior knowledge available regarding the bezigons in typical vector images, and it is essential to incorporate such prior knowledge to resolve the ambiguities and further constrain the solution space. Unfortunately, little academic attention has been directly focused on such prior knowledge; the available curve priors suggested in the literature either cannot be directly applied for bezigons [1] or are not specialized for vectorization [6]. Therefore, studying the characteristics of both reasonable and unreasonable bezigons for image vectorization, and incorporating closely related prior knowledge into our bezigon optimization, is another challenge to be addressed.

Figure 3: Four types of failure cases that occur when only data energy is considered. (a) Self-intersection. (b) False corners with small angle variations. (c) Short handle. (d) Twisted section.

In this paper, we present solutions to the above challenges and propose a bezigon-specific optimization framework for more precise clipart vectorization. Our main contributions are as follows:

  • By analyzing several rasterization approaches, we identify an appropriate rasterization function, theoretically prove certain analytic properties thereof that facilitate effective optimization for our purposes (using the theory of generalized functions [7]), and experimentally validate its compatibility and robustness for vectorizing various types of clipart images. Thus, we establish a framework for clipart vectorization via the direct optimization of bezigons. Meanwhile, we provide some approximate criteria for determining whether a rasterization function is suitable for optimization- based image vectorization.

  • Based on an intensive study of reasonable bezigons in typical vector images as well as unreasonable bezigons arising from experiments, we classify the common illegal cases of bezigon primitives into four categories: self-intersections, false corners with small angle variations, short handles, and twisted sections (Figure 3). To address these illegal cases, we suggest a self-intersection prior term, an angle-variation prior term, a Bézier-handle prior term, and a curve-length prior term. All these terms are incorporated into our framework to further constrain the solution space and to provide broadly reasonable guidance for bezigon optimization. Moreover, errors in the curve boundaries, if any remain, become visually insignificant because the resultant bezigons are more reasonable and aesthetically pleasing in general.

  • By taking full advantage of the local control property of Bézier curves, we propose a piecewise optimization strategy to effectively solve the problem of bezigon optimization. This strategy considerably reduces the computational cost and makes our vectorization method more practical.

  • Based on the above techniques, we suggest a new bezigon optimization framework. In such a framework, we can effectively vectorize a clipart image or refine vector results obtained using other approaches. Notably, such a framework is generally capable of incorporating any bezigon rasterization model and additional prior knowledge for the purpose of image vectorization or other applications, such as curve stylization.

The experimental results show that our method outperforms both the current state-of-the-art method and commonly used commercial software in terms of bezigon quality, especially in tough vectorization cases such as smooth boundaries with high curvatures, obtuse corners, and slightly bent edges.

The remainder of this paper is organized as follows: Section 2 briefly reviews existing clipart vectorization approaches. Section 3 formulates the problem of clipart vectorization in terms of bezigon optimization. An overview of the proposed vectorization framework and our points of focus is also provided in this section. Section 4 fully explains our approach to the direct optimization of bezigons for image vectorization. The experimental results and comparisons are presented in Section 5, and the paper concludes with a discussion of further perspectives on this work in Section 6.

2 Related Work

Various other types of image vectorization methods exist that are specific to line drawings [8, 9, 10, 11, 12, 13, 14], natural images [15, 16, 17, 18, 19, 20, 21, 22, 23], and pixel art [24]. However, these methods merely capture the intrinsic nature of clipart images and are likely to fail in generating precise curve boundaries; thus, they are not well suited for the task considered here.

In the last decade, several methods [25, 26, 1, 4, 2] have been proposed for clipart image vectorization. These methods typically involve segmenting the input image into a set of regions and inferring the color and the boundary location for each region.

To overcome the poor quality of the segmentation that results from general image vectorization, [25] exploited a visual feature of certain types of cartoons, i.e., shapes that are typically bounded by bold dark contours, and succeeded in producing a more precise segmentation technique for clipart images. However, this approach could only address regions enclosed by such strokes, which is not always the case in modern clipart images.

To further improve the segmentation and more semantically infer the shape color, [4] proposed a novel trapped-ball segmentation method that can segment a clipart image more semantically even when some regions are non-uniformly colored. Moreover, this approach considers temporal coherence and is capable of vectorizing cartoon animations. Such progress is impressive, but segmentation, color estimation and vectorizing animations are not our topics of focus.

Perhaps the most difficult aspect of image vectorization still lies in the inference of boundary locations. As previously mentioned, [1] is the state-of-the-art vectorization algorithm with respect to its precision of boundary location, especially for the vectorization of uniformly colored shapes. However, the contour optimization of this method, which plays the most important role in the algorithm, is specialized for polygons rather than bezigons and hence occasionally results in inaccurate bezigons. It seems that extending this method’s approach to curve fitting by somehow managing to fully use the information provided by the raster input might solve the problem. However, this is a non-trivial task for the reasons mentioned in Section 1. Moreover, this process would result in a bezigon optimization problem similar to ours.

In addition to the academic literature, there are also a number of related commercial tools, such as Adobe Illustrator [27], Corel CorelDRAW [28] and Vector Magic [5] (a product based on the technology of [1]), as well as open-source projects such as Potrace [2] and AutoTrace [3]. Of these tools, Adobe Illustrator is the most representative, and Vector Magic achieves the best results in terms of bezigon boundary precision. In this paper, we compare our algorithm with these two software packages. Although the technical details of most commercial tools are unavailable, the experimental results indicate that these tools exhibit a problem similar to (or even worse than) that of [1, 5].

In summary, insufficient precision in identifying bezigon boundaries is the most common shortcoming of existing vectorization methods. Therefore, improving the precision of bezigon boundaries, which is important for vectorizing clipart images, is the primary goal of this paper.

3 Problem Formulation and Overview of Our Framework

To facilitate a better understanding of this paper, in this section, we formulate the related problem along with the relevant notation and then provide an overview of the proposed vectorization framework and our topics of interest.

For the sake of simplicity, we consider only a single bezigon. Our work can easily be extended to situations that involve two or more bezigons because each bezigon can be independently vectorized.

3.1 Problem Formulation

Given a raster image, the primary task of clipart image vectorization is to infer a bezigon from the raster input. In a typical vector image, a bezigon can be completely determined by its geometric parameters and its color parameters.

Geometric parameters. As previously mentioned, a bezigon is simply a series of Bézier curves joined end to end, i.e.,


Here, denotes the number of curves in the bezigon, and represents the -th curve, which is assumed without loss of generality to be a 2D cubic Bézier curve with the following parametric form:


where the constitute the four control points of the -th Bézier curve. The last control point of one curve coincides with the starting point of the next curve, i.e., . Thus, all geometric parameters of a bezigon can be represented as


Color parameters. Without loss of generality, we consider that the color of the bezigon at pixel is represented by the function . If the region color is assumed to be uniform, then , and the color parameter is . If a quadratic color model is assumed, then ; thus, the color parameters are .

Now, for a given raster input image , our objective can be considered to be the inference of the parameters


from such that the bezigon that is defined by can explain the input image . In other words, the raster image of the bezigon should be similar to the input image. The problem is obviously a typical non-linear and ill-posed problem because there may be many possible solutions because of uncertainties in the imaging process and ambiguities of visual interpretation. To resolve the intrinsic ill-posedness of the problem, we must further constrain the solution space by introducing additional prior knowledge regarding bezigons in vector images.

Based on the above discussion, we will adopt an energy minimization approach that is widely used in many computer vision algorithms [29].

We first define our energy function as


where is the so-called data energy, which measures the fidelity of a vector solution to the observed raster image, and is the so-called prior energy, which is the formulation of our constraints or prior knowledge regarding reasonable bezigons for the above- mentioned vector images.

Consequently, the problem of this paper will be formulated in terms of identifying the optimal bezigon such that


3.2 Overview of Our Framework for Optimization

Once our energy function is fully specified, the entire energy minimization framework can be divided into two phases: a bezigon initialization phase and a bezigon-specific optimization phase (Figure 4).

Figure 4: Overview of our framework. (a) Raster input. (b) Segmentation result. (c) Initial bezigons. (d) Optimized bezigons.

Bezigon initialization phase. The initialization phase takes a raster image (Figure 4a) as input and outputs a set of initial bezigons (Figure 4c). These bezigons can be either obtained using other existing vectorization methods or extracted from the input image. A simple, fully automated method of accomplishing this extraction consists of two steps: a segmentation step that is used to segment the input image into a set of regions [30] (Figure 4b) and a boundary-fitting step to fit a piecewise cubic Bézier curve to the boundary of each region [31] (Figure 4c). As another option, the initial bezigons can also be manually drawn or interactively refined by the user. Regardless of which method is used, the obtained bezigons serve as initial parameters in the next phase; hence, they are not necessary highly accurate. The technical details of this phase are outside of the scope of this paper.

Bezigon optimization phase. The optimization phase is the primary concern of this paper. This phase includes direct bezigon optimization, which is the task that we are emphasizing. This process takes the initial bezigons from the first phase as input and outputs the optimal bezigons as the final vector result. In contrast to other existing vectorization approaches, this phase of our framework consists of neither simply applying a curve-fitting algorithm (e.g., [31]) nor indirectly optimizing bezigons according to an intermediate representation (e.g., polygons in [1]). Instead, we optimize the bezigon parameters by directly observing the raster input and incorporating both the image-tracing experience of experts and prior knowledge from existing hand-drawn vector images. The bezigon optimization and these sources of information are simultaneously bridged by our data energy and prior energy. In this way, unnecessary accumulated errors introduced by the intermediate process can be avoided, and hence, the quality of the resultant bezigon can be improved. However, as stated previously, there are several as- yet-unresolved challenges arising from such an optimization approach. Therefore, our paper will emphasize these issues. Section 4 presents a discussion of our solution method and explains our main contributions.

There are three major advantages to our framework. First, the error arising from the entire vectorization pipeline can be minimized. Second, any bezigon- based priors can be conveniently incorporated to generate even more reasonable results, once we have a better understanding of bezigons in typical vector images, or to cause the resultant bezigons to satisfy certain constraints of other specific applications. Third, our vectorization approach behaves similarly to bezigon evolution, which is particularly well suited to the vectorization of clipart animations, and facilitates the further refinement of inaccurate bezigons resulting from other vectorization approaches.

4 Approach for Directly Optimizing Bezigons

In this section, we solve some key issues related to bezigon optimization. The optimization involves specifying the data energy with the proper rasterization model (Section 4.1) and several bezigon-specific prior terms (Section 4.2). To more efficiently solve Equation 6, we also explore the nature of bezigon parameters and propose a piecewise optimization strategy (Section 4.3). In the following, we will use the same notations as are used in Section 3.

4.1 Data Energy

To fully utilize the information provided by the input image, we define the data energy as the distance between the input image and the image generated by rasterizing a vector solution :


Here, the function models a specific bezigon rasterization process. The function takes the parameters W of a bezigon as input and produces a raster image of the same size as the input image . and denote the values at pixel in the rasterized image given by and in the input image, respectively. is the lattice of the input image . The denominator represents the arc length of the initial bezigon. This denominator is fixed during the optimization and can be easily estimated from the geometric parameters of the initial bezigon, i.e.,

Two issues now arise for consideration. First, a bezigon rasterization function for should be specified because such a function is essential to make Equation 8 suitable for optimization. It is also one of the most challenging aspects of direct bezigon optimization. As previously mentioned, the most important contribution of the current state-of-the-art approach [1] also lies in finding an appropriate rasterization function, but one that is specific to polygon optimization. For bezigon optimization, however, research concerning suitable rasterization functions is still lacking in the existing literature. Second, we must address the case in which the input image is not generated by the specified rasterization function used for the data energy because the rasterization method that generates the given input image is generally unknown and most likely not the same as our function.

Regarding the first issue, several methods exist for directly or indirectly rasterizing bezigons [32, 33, 34, 35, 36, 37, 38, 39], each of which corresponds to a candidate rasterization function . However, we find that nearly all such functions yield poor results when a typical solver for nonlinear optimization (such as conjugate gradient, l-BFGS, or NEWUOA) is applied. This is because most available rasterization functions are either piecewise flat, or discontinuous, almost everywhere (as shown in Figure 2b). Although such discontinuities pose no problems for common rasterization tasks, they can strongly degrade the effectiveness or efficiency of optimization. Various specific optimization approaches (such as [40]) can be applied in the case of discontinuous functions. However, our experimental results indicate that such approaches often fail to produce satisfactory bezigons. Moreover, these solvers are relatively slow, which limits their use in image vectorization. Based on the above experiments and analysis, we recognize that an appropriate rasterization function should exhibit certain properties, such as continuity with respect to the bezigon parameters. Moreover, if the rasterization function is also differentiable with respect to those parameters, more efficient and effective solvers can be adopted to optimize our energy function to obtain better results.

In the search for proper rasterization approaches, the approach presented in [35] came to our attention. This approach uses a hierarchical Haar wavelet representation to analytically calculate an anti-aliased raster image of bezigons. According to [35], for a bezigon , the pixel color value at in the resultant raster image can be calculated as follows:


Here, represents a specific scaling from the original resolution to the pixel solution , and , , represents a specific translation in the finite set corresponding to all possible translations in the current scaling. and are a two- dimensional Haar wavelet basis function and its coefficient, respectively. The definitions of these two functions can be found in Appendix 7.1.

Although [35] provides a closed-form solution for rasterizing bezigons, the continuity and differentiability of are not obvious because of the discontinuity of the Haar wavelet basis functions. One of the most important tasks of this section is to present the proofs of the continuity and differentiability of this rasterization function. The latter is not straightforward. To obtain the proof, we must rely on several properties and operations from the theory of generalized functions [7].

Note that for any given coordinate , is a function of the bezigon parameters . To establish the function’s continuity and differentiability, we present the following theorems.

Theorem 1 (continuity)

is a continuous function with respect to all bezigon parameters .


As previously stated, the bezigon parameters consist of the color parameters and the geometric parameters . According to Equation 9, is continuous as long as the assumed color model is continuous with respect to the color parameters , which is often the case. With respect to the geometric parameters , is also continuous. A detailed analysis can be found in Appendix 7.2.

Obviously, if serves as our rasterization function , then the resultant data energy is also a continuous function. The smooth curve that corresponds to our data energy in Figure 2b reflects such a property as well. The continuity of the data energy not only enables us to apply a common solver for the nonlinear optimization but also facilitates the resolution of any ambiguity that arises from the observation of the input data.

Theorem 2 (differentiability)

is differentiable with respect to the bezigon parameters .


Most color models are differentiable with respect to the color parameters . In such cases, is obviously differentiable with respect to the color parameters, according to Equation 9. However, the differentiability of with respect to the geometric parameters is not obvious. We use the theory of generalized functions to analyze this matter. Because of space limitations and the complexity of the discussion, the proof and the derivatives are presented in Appendix 7.3.

Based on the above analysis and theorems, we can conclude that may be a suitable candidate for the rasterization function in Equation 8. Therefore, this rasterization function may be adopted in the proposed framework. Then, our final data energy can be rewritten as


Now, we consider the second issue. Because there are many commonly used rasterization methods, it is often the case that the input raster image is not generated by the rasterization method used in our data energy term. This could be an issue if there are significant differences in the rasterization results between our chosen method and the method used to generate the input image. Therefore, to ensure the practical utility of the proposed vectorization method, we must investigate whether the selected rasterization function can closely approximate the rendering results of other commonly used rasterization approaches.

Fortunately, our selected rasterizer is still a suitable choice in this context. To prove this claim, we perform the following experiment: We collect a set of real-world vector images. All these vector images are rasterized by each of the commonly used anti-aliased rasterizers, using the recently proposed methods, and by . Note that the only possible differences in images produced by different rasterizers lie in pixels that intersect the bezigon boundary. To further clarify the comparison, we consider only the differences among such pixels in the resultant images. Histograms of these differences are presented in Figure 5. It is readily apparent that a large proportion of the “boundary” pixels that are rendered by any other rasterizer remain identical those produced by . Moreover, all distributions have means of zero and small variances. Therefore, the pixel values generated by our rasterization function can be safely assumed to be a good approximation to those generated by other commonly used rasterization methods, and hence, our rasterization function can still accurately model the original rasterization process of most clipart images.

Figure 5: Histograms of differences between pixel values produced by and those produced by the other rasterizers listed in the figure.

In summary, we have proven the suitability of our bezigon rasterization function for optimization as well as its compatibility with various clipart raster input, and we have presented the definition of our data energy. Notably, for any other rasterization function that is a candidate for application to vectorize a certain type of image, a similar procedure should be followed to evaluate the suitability and compatibility of that function.

4.2 Prior Energy

After the data energy has been carefully selected, various simple cases (e.g., the vectorization of a simple bezigon in a high-resolution raster image) can already be effectively addressed when there is adequate information implicit in the observed raster data. However, it is more often the case that the bezigons are relatively complex and that the information available in the raster input is inadequate. In such a case, profound uncertainty regarding the correct solution may remain if the data energy alone is considered. Therefore, the optimization may result in unreasonable bezigons that can be easily identified by the human eye.

Indeed, our intensive experiments provide evidence of such issues. More specifically, the failure cases of direct bezigon optimization using only data energy generally fall into four categories: (a) self-intersections, (b) false corners with small angle variations, (c) short handles, and (d) twisted sections (Figure 3).

All these bezigons are considered to be unreasonable because they are aesthetically unpleasing and, according to expert opinion, are unlikely to be drawn or traced by a professional illustrator. These types of bezigons are also rare in typical vector images. (Taking self-intersection as an example, we find that very few bezigons in vector images from the Open Clipart library [41] intersect with themselves. Most bezigons that exhibit self-intersection are believed to have been created by an amateur or automatically traced from a raster image.) The reason for the occurrence of such illegal bezigons is that their corresponding raster images are quite similar to the input images (compare Figures 6f, 8f, 9f and 10f with 6h, 8h, 9h and 10h, respectively), although their vector forms are significantly different from the ground-truth images (compare Figures 6b, 8b, 9b and 10b with 6d, 8d, 9d and 10d, respectively). This situation results in low data energy, especially when the resolution of the input image is relatively low.

Our prior energy is designed precisely to solve the above-mentioned problems and to ensure that the resultant bezigons are reasonable and aesthetically pleasing. More specifically, we construct a prior functional to reduce the likelihood of each type of failure cases. Thus, our prior energy has the following form:


where , , and represent the self-intersection prior term (SPT), the angle-variation prior term (APT), the Bézier-handle prior term (HPT) and the curve-length prior term (LPT), respectively, and , , and are their respective weights. Each of the prior terms is specifically defined and explained in the following subsections.

Elimination of Self-intersection

Figure 6: An example of eliminating self-intersection. (a) The entire ground-truth vector image and the local patch to be processed. (b) Result of optimization without the SPT. (c) Result of optimization with the SPT. (d) Ground truth. (e-h) are the rasterization results corresponding to (a-d), respectively.

Certain approaches are seemingly capable of avoiding self-intersection but are not feasible in practice. One intuitive method is to enforce a set of highly coupled nonlinear inequality constraints and use a primal-dual interior point method [42] for optimization. However, this approach is not suitable in our case because of its computational complexity. As another naïve method, we could assign a large constant energy to a bezigon that is detected as exhibiting self-intersection. However, this provides almost no guidance for a bezigon that has already manifested self-intersection during optimization.

Figure 7: Measuring the extent of self-intersection. (a) and (b) address the first and second intersection points, respectively. The shorter divided portion in each phase is indicated by the red curve.

Instead, we attempt to analytically measure the extent of self-intersection and provide an effective regularization to automatically avoid bezigons with self-intersection. The primary advantage of our method is that it not only is capable of preventing self-intersection but also provides effective guidance to eliminate self-intersection that has already occurred. Moreover, it does not require expensive computation.

The procedure is illustrated in Figure 7. We first estimate all intersection points (indicated by red dots), if any. Each such point divides the bezigon outline into two parts. We consider the shorter of these parts (shown as red curves) and measure the extent of self-intersection by summing over their lengths. More formally, the measurement can be written as


Here, is the set of partitions corresponding to all intersection points (red dots in Figure 7), and represents the arc length along the curve from to , i.e.,


where and .

The energy term penalizes significant self-intersection. The more severe an intersection is, the more closely the length of a shorter part approaches the length of a longer part, and hence, the larger will be. When there is no self-intersection, is equal to zero. Our experiments demonstrate that optimization using the SPT results in bezigons that contain very little self-intersection and are likely to be close to the ground-truth image in terms of topology (see Figure 6c).

Regularization for Angle Variations

Figure 8: An example of regularization for angle variations. (a) The entire ground-truth vector image and the local patch to be processed. (b) Result of optimization without the APT. (c) Result of optimization with the APT. (d) Ground truth. (e-h) are the rasterization results of (a-d), respectively.

Although a simple curve-smoothing algorithm may remove small angle variations, such a method will most likely fail to preserve other visually significant corners. Moreover, it may not always be possible to identify the saliency of the corners using a fixed threshold for angle variations. Consequently, we must develop a more sophisticated method of smoothing out insignificant corners while preserving the small number of significant corners.

For this purpose, we penalize the sum of all angle variations. As a result, the optimized bezigon will consist of predominantly zero-angle variations and a small number of non-zero angle variations. This is important because it incorporates corner detection into the bezigon optimization.

More formally, we denote the two tangent vectors of the -th endpoint by and . Then, the APT can be written as follows:


The experimental results demonstrate that optimization with the APT can retain the smoothness of the bezigon while preserving visually significant corners (Figure 8c).

Avoidance of Short Bézier Handles

Figure 9: An example of the avoidance of short Bézier handles. (a) The entire ground-truth vector image and the local patch to be processed. (b) Result of optimization without the HPT. (c) Result of optimization with the HPT. (d) Ground truth. (e-h) are the rasterization results of (a-d), respectively.

To guard against the possibility of short handles, we penalize any short handle using an inverse barrier function


This term imposes a large penalty on short handles because tends toward infinity as any handle length tends toward . When the length of each handle is sufficiently large, this energy will be very small and hence will not engender serious side effects.

It can be experimentally demonstrated that optimization using the HPT considerably reduces the occurrence of unnatural bezigons, as illustrated in Figure 9b. Although the resultant handle may sometimes be slightly longer than it should be (compare the locations of the control points in Figure 9c with those in Figure 9d), such a result only affects the quality or editability of the vector results in general.

Curve-Length Prior

Figure 10: An example of the application of the curve-length prior. (a) The entire ground-truth vector image and the local patch to be processed. (b) Result of optimization without the LPT. (c) Result of optimization with the LPT. (d) Ground truth. (e-h) are the rasterization results of (a-d), respectively.

Based on the experience of experts who are proficient in image tracing, a traced curve tends to be stretched as much as possible unless there is strong evidence that the curve should shrink or twist. This prior knowledge can be used to eliminate invalid bezigons of this type because the occurrence of a highly twisted curve without strong evidence in support of such twisting from the raster input can be easily identified.

Based on such prior knowledge, we penalize the curve length to avoid invalid bezigons of this type. Thus, the LPT can be defined as follows:


Figure 10c demonstrates that the unexpectedly twisted curve of Figure 10b, for which there is insufficient evidence in the observed data, can be avoided through the adoption of the LPT.

4.3 Piecewise Bezigon Optimization

Once we have obtained the energy function developed in the previous sections, in many cases, this function can be minimized using a general non-linear optimization method. However, because a typical bezigon often consists of a large number of parameters to be optimized and because the valid range for each parameter is large, the efficiency and even the convergence of the optimization process might be an issue. However, bezigon parameters possess a strong local control property. We may reduce the number of redundant calculations by fully utilizing this property.

In this section, we explore the nature of bezigon parameters and propose a piecewise optimization strategy that allows our high-dimensional problem to be decomposed into several subcomponents that may be individually solved.

The fundamental concept of piecewise optimization is to optimize only a subset of the geometric parameters of each bezigon at any given time. This task is feasible because the effect of varying any given control point is limited to a local region of the bezigon.

More specifically, we regard two consecutive Bézier curve sections as one curve piece. Therefore, a bezigon with curve sections also consists of overlapped curve pieces. All curve pieces will be successively optimized. When optimizing a curve piece, we fix the first and last endpoints of the curve piece and determine the optimal solution for the four intermediate control points and the middle endpoint. We first optimize the five active control points (the red points in Figure 11a) of one curve piece and subsequently optimize the corresponding points (the red points in Figure 11b) of the next curve piece. It should be noted that the two consecutive pieces overlap and that two of the intermediate control points (e.g., those shown in red in both Figure 11a and Figure 11b) are shared. Therefore, all intermediate control points will be optimized twice in individual iterations. The process iteratively progresses from the first curve piece to the last.

Figure 11: Overlapped piecewise optimization. The curve sections being optimized are shown in red. (a) Optimizing the curve piece . (b) Optimizing the curve piece .

Formally, we represent the geometric parameters of the -th piece to be optimized as follows:


All remaining geometric parameters of the bezigon are held fixed during the present optimization. Therefore, optimizing this curve piece amounts to identifying the optimal configuration that minimizes a function composed of the local energies, i.e.,


Here, is the space of all possible geometric parameters for the -th piece.

Such a strategy substantially increases the efficiency of the entire optimization. Although the objective function 18 is quite similar to Equation 6, the solution space is much smaller than . Therefore, the original high-dimensional problem can be decomposed into a set of lower-dimensional problems, which greatly improves the efficiency of the overall optimization process. Moreover, all prior energy terms, except the SPT, are simply the sum of the corresponding energy of each curve section. Therefore, we can consider only two related sections when calculating these terms. In this manner, a large number of redundant computations can be eliminated.

Piecewise optimization not only is fast but also provides satisfactory bezigons with almost no decrease in accuracy. The experimental results indicate that after all curve pieces are traversed two or three times, in most cases, the resultant bezigon is nearly perfect.

As an optional step, we can jointly optimize all geometric parameters once more to further improve the result. Because our piecewise procedure can provide substantially more accurate input for further optimization, this subsequent global optimization can be significantly more efficient than it would be without piecewise optimization. The entire process of bezigon optimization is summarized in Algorithm 1.

1:  repeat
2:     for  to  do
3:        Optimize according to Equation 18
4:        Update according to
5:     end for
6:  until converged
7:  Optimize according to Equation 6 (optional)
Algorithm 1 Bezigon Optimization

The overlapped piecewise optimization strategy provides a fast yet accurate method of bezigon optimization, which is an important prerequisite for the practical application of our vectorization approach.

5 Experiments

To demonstrate the effectiveness of the approach developed in this paper, in this section, we quantitatively and qualitatively compare our method with other vectorization methods. As stated in Section 2, many vectorization algorithms and software packages exist. However, most academic work on vectorization is not relevant for comparisons because it is primarily focused on photographs or other types of vectorization. We restrict our comparisons to two approaches that are specialized for the vectorization of clipart images. One method is Vector Magic [5], which was developed on the basis of the state-of-the-art method proposed by [1] 3. The other software package we consider is Adobe Illustrator [27], which is a representative example of a widely used commercial vectorization software. The experimental results demonstrate that our approach is superior to these methods in terms of bezigon quality.

5.1 Implementation

To evaluate the effectiveness of the proposed bezigon optimization method, we have implemented a prototype image vectorization system.

The core of our system is bezigon optimization. Because of the continuity and differentiability of our energy function, either a curve piece or a global bezigon can be effectively optimized using many available optimization algorithms (such as NEWUOA [43], l-BFGS [44], and the conjugate gradient method [45]). Note that there are four tuning parameters in our objective function, namely, the weights of the four prior terms. Empirically, we set (to strongly penalize self-intersection), , and . Although there may be other weight settings that would yield better performance, we did not perform a thorough search for the optimal weights. According to our experimental results, the proposed method is generally insensitive to these parameters. Our preset weights should yield satisfactory results. However, as the quality of the raster input decreases (e.g., low-resolution input), fine tuning may become necessary to generate perfect bezigons. Although our framework does not intrinsically rely on any assumption regarding the color model, for simplicity of implementation and convenience of fair comparisons with the most commonly used clipart image vectorization methods, our prototype system currently assumes that the color in each bezigon is uniform 4, i.e., and the color parameter is an arbitrary vector in the RGB color space.

The initial bezigons can be either extracted from the input image or obtained using other vectorization methods. They are not required to be highly accurate. Most initial bezigons in our experiments are far from perfect. Of course, if the initial pose drifts too far from the optimal pose (e.g., approaches random bezigons), our bezigon optimization may become trapped in a local optimum and output imperfect results. However, in practice, this rarely occurs because it is not very difficult to estimate a bezigon that is sufficient to serve as an initial solution. The real difficulties lie in the subsequent optimization, i.e., achieving bezigons of even higher precision, which is the key issue addressed in this paper.

Note that because this prototype system was primarily developed as a proof of concept, the speed of the process is not a priority at the moment. Our implementation code is currently written in Python, a dynamically typed and interpreted language. The code is run on a laptop with an Intel Core i5-2410M @ 2.53 GHz processor with 4 GB of memory. The total execution time varies (10 secs to 10 mins) as a function of the complexity of the shapes to be vectorized. It should be much faster when implemented in a static language. Moreover, our method can be highly parallelized by virtue of the nature of wavelet rasterization, which may also considerably improve the efficiency.

5.2 Quantitative Comparisons

We use a fidelity metric to quantitatively compare our results with those of other methods. The fidelity metric generally provides a good indication of the characteristics that define a good vectorization algorithm. To further evaluate the proposed method based on human aesthetic judgment, we also present a user study.

For both comparisons, we collected a set of clipart images available in both raster and vector formats. All the raster images served as inputs to our algorithm and to the other vectorization methods. Some methods considered in the comparison require parameter tuning. To perform a fair comparison, the dominant parameters of these methods were adjusted until the number of bezigons produced as output were approximately equal to the number of bezigons in the ground-truth vector. Then, we compared the vector images resulting from the different methods with respect to fidelity and user satisfaction. The details of both comparisons are presented below.

Comparison via peak signal-to-noise ratio measurement. The quality of a vectorization is often evaluated in terms of the PSNR (peak signal-to-noise ratio) or RMSE (root-mean-square error) [1, 4]. Before evaluation, both the resultant vector image and the ground-truth image were rasterized at a specific resolution.

Figure 12 presents the histograms of the the increases in the PSNR that were achieved by our method with respect to Vector Magic (Figure 12a) and Adobe Illustrator (Figure 12b). It is evident that our method consistently yields a higher PSNR compared with competing methods. More specifically, our results reveal an increase in the PSNR of 0-5 dB with respect to Vector Magic and an increase of 10-20 dB with respect to Adobe Illustrator.

Figure 12: Histograms of the increases in the PSNR achieved by our method compared with (a) Vector Magic and (b) Adobe Illustrator.

Comparison via a user study. We also performed a user study to obtain a further evaluation based on human aesthetic judgment. For this purpose, a pairwise comparison test was created. We prepared 120 pairs of vector results. Each pair consisted of a vector image generated by our method and a vector image generated by another method (either Vector Magic or Adobe Illustrator). We constructed a web interface to show each pair of vector images, including their control points, but with no creator vectorizer name attached. Several participants with graphic design backgrounds were then asked to determine whether one image was much better than, better than, almost the same as, a little worse than, or much worse than another image in comparison with the ground truth. The statistical results are presented in Figure 13. This figure indicates that our results were considered to be superior those of the current state-of-the-art method (Vector Magic) in nearly 80% of the pairwise comparisons. Approximately one quarter of the images were deemed to be much better (Figure 13a). Compared with the representative commercial software (Adobe Illustrator), almost all of our results were considered to be better, and half of them were judged to be much better.

Figure 13: Results from a user study that compared our results with those of (a) Vector Magic and (b) Adobe Illustrator.

To summarize the quantitative comparisons, our approach is found to be superior to both the state-of-the-art algorithm and the representative commercial tool in terms of both fidelity and user satisfaction.

5.3 Qualitative Comparisons

For qualitative comparison of our method with the other methods, we provide a few results (Figures 14-16) obtained using our approach and the competing methods. Because of space limitations, we highlight only one local patch for each image (shown in the even rows of Figures 14-16). From the comparison, we observe that our results, in general, are more faithful to the raster input and that the shapes of the resultant bezigons are more reasonable and visually pleasing. More specifically, our strengths lie in the following cases.

Case 1: smooth boundary with high curvature. In a typical clipart image, smooth boundaries with high curvature are often found in the round corners of a shape (Figure 14j and Figure 14t). Traditional methods typically use a chain of densely sampled points to represent such structures and subsequently fit Bézier curves to the chain. The problem with this approach is that reconstruction of curves from such an intermediate representation can be excessively ambiguous in such regions. Therefore, the resultant bezigons exhibit false corners (see the redundant corners in Figures 14g, 14h, and 14r). Instead, our method directly infers the bezigons from the raster input to reduce such ambiguities to the greatest possible extent. Consequently, our bezigons contain fewer false sharp corners (Figure 14i and 14s).

Figure 14: Comparisons in cases of smooth boundaries with high curvature. From left to right: raster input, Adobe’s result, Vector Magic’s result, our result, ground truth.

Case 2: obtuse corners. Many shapes to be vectorized contain obtuse corners (Figures 15j and 15t). Preserving such corners is very important even when the result is visually satisfactory because it can be difficult to subsequently edit the vectorized shapes. However, for the same reason discussed in Case 1, traditional methods tend to smooth out such corners or yield a curve endpoint with an incorrect location (e.g., the overly smoothed boundaries in Figures 15h, 15q and 15r). Our method benefits from the direct optimization of the bezigons and avoids errors introduced by fitting sampled points that cannot accurately indicate the correct location of an obtuse corner. Therefore, our results typically preserve more obtuse corners (Figures 15i and 15s).

Figure 15: Comparisons in cases of obtuse corners. From left to right: raster input, Adobe’s result, Vector Magic’s result, our result, ground truth.

Case 3: slightly bent edges. Vectorizing various detailed structures, such as slightly bent edges (Figures 16j and 16t), is also difficult for traditional methods. Because the error associated with the generation of an intermediate representation is unavoidable, small perturbations of the point chain are often considered to be noise rather than signal. Therefore, certain slightly bent edges in the resultant bezigons are straightened (Figures 16g, 16h and 16r). In our framework, we trust only the original raster input and any prior knowledge regarding the curves. Although not every type of structure can be preserved (e.g., large perturbations or zigzag-like structures may be suppressed based on a priori knowledge of typical vector images), small shapes and slightly bent edges are more likely to be preserved in our results (Figures 16i and 16s).

Figure 16: Comparisons in cases of slightly bent edges. From left to right: raster input, Adobe’s result, Vector Magic’s result, our result, ground truth.

From the analysis presented above, we can conclude that our direct bezigon optimization for image vectorization produces more convincing vector results in most cases.

6 Conclusion and Future Work

We have presented a novel framework for clipart image vectorization. In contrast to other methods, the proposed approach optimizes bezigons by directly observing the raster input and incorporating bezigon-based priors to minimize the errors introduced by other intermediate procedures. Both quantitative and qualitative comparisons demonstrate that the quality of the bezigons generated by our approach is typically higher compared with those generated by the current state-of-the-art method and by commonly used commercial software.

Of course, certain types of clipart images (e.g., noisy images or low-resolution images that contain complex structures) exist that are too ambiguous to be precisely vectorized by any automated approach, including our method (Figure 17 shows such cases). Perhaps the best way to address these images is to incorporate a small amount of user intervention. For this purpose, our system provides a friendly graphical interface for user refinement during the course of bezigon optimization. The evolving bezigons are presented in the interface. The user is allowed to modify the location of any control point by dragging the mouse cursor. Our system takes the modified bezigon as a new initial bezigon and performs the subsequent optimization.

Figure 17: Cases producing poor results. (a) A noisy input image. (c) A low-resolution input image with complex content. (b)(d) Our vectorized output.

As future research, we will resolve additional ambiguities by incorporating more prior knowledge regarding vector images for bezigon optimization. Because we directly optimize the bezigons, it is trivial to incorporate such prior information into our framework.

We also plan to develop a commercial software package based on the proposed method. To make the software as efficient as possible, we will optimize the code of the current implementation and consider parallelization of the proposed approach. Remarkably, many components of our framework, ranging from the wavelet rasterization to the optimization of local structures that are completely irrelevant to each other, can be highly parallelized.


This research project has been underway for nearly three years. We would like to thank all the participants, especially two professional graphical designers, Beck Yang and Fanco Ke, for their helpful suggestions and valuable comments.



In this appendix we will introduce the complete definition of the rasterization function , where are the parameter set of a bezigon, and give a proof to illustrate the continuity and differentiability of this function with respect to the geometrical parameters.

7.1 Basic Definitions

Before describing the rasterization function, we introduce some basic definitions that will be needed throughout this section.

Based on [35], uses a hierarchical Haar wavelet representation to analytically calculate an anti-aliased raster image of a bezigon. Haar wavelets, as is well known, are represented by its mother wavelet function


and its scaling function


Based on the above two functions, the 1D Haar basis with a scaling parameter and a translating parameter could be formally defined as


Now let , the 2D Haar basis defined as following will be used later:


7.2 Rasterization Function And Its Continuity

According to [35], the value of pixel in the raster image of a given 2D bezigon, indicated by the parameters , takes the form


Here, correspond to the wavelet coefficients contributed by the -th Bézier curve segment:


The notations and are the same as Equation 2 and 3 in Section 3. Note that given the bezigon parameters , both and are functions of one variable , while both and are first-order derivatives with respect to . For all and ,


It is obvious that both and are continuous with respect to the variable respectively. Also, if is a continuous function of any parameters of , both and are too. From Equation 2, it is easy to see that both and are continuous with respect to any parameters of and . Therefore, are also continuous with respect to . Thus the continuity of with respect to geometrical parameters is totally determined by above discussion and its formula 27. Such property is also reflected in Figure 2, where the data energy function using is continuous with respect to an arbitrary geometrical parameter.

7.3 Derivatives of with respect to geometrical parameters

We will show that is differentiable with respect to the geometrical parameters B, which verifies Theorem 2 in Section 4. Since the discontinuity of Haar function, the conclusion of Theorem 2 is not obvious. To achieve this goal, we will use the theory of generalized functions and generalized derivatives [7]. Following deductions are all in the sense of generalized function and generalized derivative.

We first express formally such derivatives as




for all , , and .

Then the remaining problem is to discuss the differentiability of Haar basis coefficients with respect to geometrical parameters, i.e., the existence of and for all , , , and .

Generalized Derivatives of Haar Basis Functions. It is well known that the generalized derivative of :


Here is an impulse function satisfying:


Here is an arbitrary continuous function. Note that when composed with a continuous function , holds the following property [7]:


Here is the set of the real roots of . Similarly,


Therefore, for all , ,


Similarly, for all , ,


Derivatives of Haar Basis Coefficients with Respect to Geometrical Parameters. We first calculate . According to the generalized functions theory [7], for all , , , and ,


Since has nothing to do with the parameter according to Equation 2, we have


Thus, the derivative of with respect to any value of exists . Also, it can be analytically calculated by substituting Equation 2 and Equation 22 into Equation 40.

Now we turn to . Similar to Equation 39 and 40, we have