Effective Clipart Image Vectorization Through Direct Optimization of Bezigons
Abstract
Bezigons, i.e., closed paths composed of Bézier curves, have been widely employed to describe shapes in image vectorization results. However, most existing vectorization techniques infer the bezigons by simply approximating an intermediate vector representation (such as polygons). Consequently, the resultant bezigons are sometimes imperfect due to accumulated errors, fitting ambiguities, and a lack of curve priors, especially for lowresolution images. In this paper, we describe a novel method for vectorizing clipart images. In contrast to previous methods, we directly optimize the bezigons rather than using other intermediate representations; therefore, the resultant bezigons are not only of higher fidelity compared with the original raster image but also more reasonable because they were traced by a proficient expert. To enable such optimization, we have overcome several challenges and have devised a differentiable data energy as well as several curvebased prior terms. To improve the efficiency of the optimization, we also take advantage of the local control property of bezigons and adopt an overlapped piecewise optimization strategy. The experimental results show that our method outperforms both the current stateoftheart method and commonly used commercial software in terms of bezigon quality.
lipart vectorization, clipart tracing, bezigon optimization
1 Introduction
\IEEEPARstartImage vectorization, also known as image tracing, is the process of converting a bitmap image into a vector image. There are various types of vectorization. In the present work, we focus on clipart image vectorization. In such a case, the input raster is a clipart image, which is generally composed exclusively of digital illustrations like cartoons, logos, and symbols. Notably, this kind of images do not include photographs or scans of real handmade drawings.
There is a huge demand for such a conversion technique. According to a survey from [1], more than 7 million man hours are spent on vectorizing images in the United States every year, and approximately 60% of the more than 10 million images to be vectorized are clipart images such as logos and other rasterized vector art. As further evidence of the large demand for clipart image vectorization, there is also a large market for online services that specialize in tracing clipart images. The conversion can be manually performed, but this may require a substantial amount of time and effort, particularly for those users who are not proficient in tracing images. This situation provides strong motivation for the development of an automated algorithm for precise vectorization.
Notably, most modern methods that are appropriate for vectorizing clipart images [2, 3, 1, 4] use bezigons to represent the resultant vector contours, which has become the standard because of the compactness and editability of bezigons.
However, almost no existing methods are specialized for directly obtaining
bezigons. Such methods typically direct most of their effort toward the
generation of intermediate polygons (Figure 1b)
and consequently estimate bezigons (Figure 1c)
that reproduce these polygons rather than the original image
[2, 1, 4]. Among
these methods, [1] (also known as Vector
Magic [5]) generally produces the most accurate bezigon boundaries.
To solve the problems summarized above while retaining the advantages of the stateoftheart method [1], an intuitive approach is to devise an effective optimization mechanism that is specific to bezigons.
However, establishing such a framework is nontrivial. In general, a direct
optimization of bezigons would necessitate an appropriate rasterization
function specialized for bezigons because such a function defines the bezigons’
fidelity to the raster image and serves as a fundamental basis for the entire
bezigonspecific optimization mechanism. However, most available rasterization
functions are not suited to this purpose because commonly used bezigon
rasterization methods are typically based on sampling subpixel locations of
the pixel grid
Even if this first challenge is overcome, the solution space might remain large and contain many unreasonable bezigons that give rise to nearly the same raster image (Figure 3 illustrates examples of such illegal cases). We observe that reasonable bezigons, when serving as vector primitives, occupy only a small fraction of the parameter space of general bezigons. There should be specific prior knowledge available regarding the bezigons in typical vector images, and it is essential to incorporate such prior knowledge to resolve the ambiguities and further constrain the solution space. Unfortunately, little academic attention has been directly focused on such prior knowledge; the available curve priors suggested in the literature either cannot be directly applied for bezigons [1] or are not specialized for vectorization [6]. Therefore, studying the characteristics of both reasonable and unreasonable bezigons for image vectorization, and incorporating closely related prior knowledge into our bezigon optimization, is another challenge to be addressed.
In this paper, we present solutions to the above challenges and propose a bezigonspecific optimization framework for more precise clipart vectorization. Our main contributions are as follows:

By analyzing several rasterization approaches, we identify an appropriate rasterization function, theoretically prove certain analytic properties thereof that facilitate effective optimization for our purposes (using the theory of generalized functions [7]), and experimentally validate its compatibility and robustness for vectorizing various types of clipart images. Thus, we establish a framework for clipart vectorization via the direct optimization of bezigons. Meanwhile, we provide some approximate criteria for determining whether a rasterization function is suitable for optimization based image vectorization.

Based on an intensive study of reasonable bezigons in typical vector images as well as unreasonable bezigons arising from experiments, we classify the common illegal cases of bezigon primitives into four categories: selfintersections, false corners with small angle variations, short handles, and twisted sections (Figure 3). To address these illegal cases, we suggest a selfintersection prior term, an anglevariation prior term, a Bézierhandle prior term, and a curvelength prior term. All these terms are incorporated into our framework to further constrain the solution space and to provide broadly reasonable guidance for bezigon optimization. Moreover, errors in the curve boundaries, if any remain, become visually insignificant because the resultant bezigons are more reasonable and aesthetically pleasing in general.

By taking full advantage of the local control property of Bézier curves, we propose a piecewise optimization strategy to effectively solve the problem of bezigon optimization. This strategy considerably reduces the computational cost and makes our vectorization method more practical.

Based on the above techniques, we suggest a new bezigon optimization framework. In such a framework, we can effectively vectorize a clipart image or refine vector results obtained using other approaches. Notably, such a framework is generally capable of incorporating any bezigon rasterization model and additional prior knowledge for the purpose of image vectorization or other applications, such as curve stylization.
The experimental results show that our method outperforms both the current stateoftheart method and commonly used commercial software in terms of bezigon quality, especially in tough vectorization cases such as smooth boundaries with high curvatures, obtuse corners, and slightly bent edges.
The remainder of this paper is organized as follows: Section 2 briefly reviews existing clipart vectorization approaches. Section 3 formulates the problem of clipart vectorization in terms of bezigon optimization. An overview of the proposed vectorization framework and our points of focus is also provided in this section. Section 4 fully explains our approach to the direct optimization of bezigons for image vectorization. The experimental results and comparisons are presented in Section 5, and the paper concludes with a discussion of further perspectives on this work in Section 6.
2 Related Work
Various other types of image vectorization methods exist that are specific to line drawings [8, 9, 10, 11, 12, 13, 14], natural images [15, 16, 17, 18, 19, 20, 21, 22, 23], and pixel art [24]. However, these methods merely capture the intrinsic nature of clipart images and are likely to fail in generating precise curve boundaries; thus, they are not well suited for the task considered here.
In the last decade, several methods [25, 26, 1, 4, 2] have been proposed for clipart image vectorization. These methods typically involve segmenting the input image into a set of regions and inferring the color and the boundary location for each region.
To overcome the poor quality of the segmentation that results from general image vectorization, [25] exploited a visual feature of certain types of cartoons, i.e., shapes that are typically bounded by bold dark contours, and succeeded in producing a more precise segmentation technique for clipart images. However, this approach could only address regions enclosed by such strokes, which is not always the case in modern clipart images.
To further improve the segmentation and more semantically infer the shape color, [4] proposed a novel trappedball segmentation method that can segment a clipart image more semantically even when some regions are nonuniformly colored. Moreover, this approach considers temporal coherence and is capable of vectorizing cartoon animations. Such progress is impressive, but segmentation, color estimation and vectorizing animations are not our topics of focus.
Perhaps the most difficult aspect of image vectorization still lies in the inference of boundary locations. As previously mentioned, [1] is the stateoftheart vectorization algorithm with respect to its precision of boundary location, especially for the vectorization of uniformly colored shapes. However, the contour optimization of this method, which plays the most important role in the algorithm, is specialized for polygons rather than bezigons and hence occasionally results in inaccurate bezigons. It seems that extending this method’s approach to curve fitting by somehow managing to fully use the information provided by the raster input might solve the problem. However, this is a nontrivial task for the reasons mentioned in Section 1. Moreover, this process would result in a bezigon optimization problem similar to ours.
In addition to the academic literature, there are also a number of related commercial tools, such as Adobe Illustrator [27], Corel CorelDRAW [28] and Vector Magic [5] (a product based on the technology of [1]), as well as opensource projects such as Potrace [2] and AutoTrace [3]. Of these tools, Adobe Illustrator is the most representative, and Vector Magic achieves the best results in terms of bezigon boundary precision. In this paper, we compare our algorithm with these two software packages. Although the technical details of most commercial tools are unavailable, the experimental results indicate that these tools exhibit a problem similar to (or even worse than) that of [1, 5].
In summary, insufficient precision in identifying bezigon boundaries is the most common shortcoming of existing vectorization methods. Therefore, improving the precision of bezigon boundaries, which is important for vectorizing clipart images, is the primary goal of this paper.
3 Problem Formulation and Overview of Our Framework
To facilitate a better understanding of this paper, in this section, we formulate the related problem along with the relevant notation and then provide an overview of the proposed vectorization framework and our topics of interest.
For the sake of simplicity, we consider only a single bezigon. Our work can easily be extended to situations that involve two or more bezigons because each bezigon can be independently vectorized.
3.1 Problem Formulation
Given a raster image, the primary task of clipart image vectorization is to infer a bezigon from the raster input. In a typical vector image, a bezigon can be completely determined by its geometric parameters and its color parameters.
Geometric parameters. As previously mentioned, a bezigon is simply a series of Bézier curves joined end to end, i.e.,
(1) 
Here, denotes the number of curves in the bezigon, and represents the th curve, which is assumed without loss of generality to be a 2D cubic Bézier curve with the following parametric form:
(2) 
where the constitute the four control points of the th Bézier curve. The last control point of one curve coincides with the starting point of the next curve, i.e., . Thus, all geometric parameters of a bezigon can be represented as
(3) 
Color parameters. Without loss of generality, we consider that the color of the bezigon at pixel is represented by the function . If the region color is assumed to be uniform, then , and the color parameter is . If a quadratic color model is assumed, then ; thus, the color parameters are .
Now, for a given raster input image , our objective can be considered to be the inference of the parameters
(4) 
from such that the bezigon that is defined by can explain the input image . In other words, the raster image of the bezigon should be similar to the input image. The problem is obviously a typical nonlinear and illposed problem because there may be many possible solutions because of uncertainties in the imaging process and ambiguities of visual interpretation. To resolve the intrinsic illposedness of the problem, we must further constrain the solution space by introducing additional prior knowledge regarding bezigons in vector images.
Based on the above discussion, we will adopt an energy minimization approach that is widely used in many computer vision algorithms [29].
We first define our energy function as
(5) 
where is the socalled data energy, which measures the fidelity of a vector solution to the observed raster image, and is the socalled prior energy, which is the formulation of our constraints or prior knowledge regarding reasonable bezigons for the above mentioned vector images.
Consequently, the problem of this paper will be formulated in terms of identifying the optimal bezigon such that
(6) 
3.2 Overview of Our Framework for Optimization
Once our energy function is fully specified, the entire energy minimization framework can be divided into two phases: a bezigon initialization phase and a bezigonspecific optimization phase (Figure 4).
Bezigon initialization phase. The initialization phase takes a raster image (Figure 4a) as input and outputs a set of initial bezigons (Figure 4c). These bezigons can be either obtained using other existing vectorization methods or extracted from the input image. A simple, fully automated method of accomplishing this extraction consists of two steps: a segmentation step that is used to segment the input image into a set of regions [30] (Figure 4b) and a boundaryfitting step to fit a piecewise cubic Bézier curve to the boundary of each region [31] (Figure 4c). As another option, the initial bezigons can also be manually drawn or interactively refined by the user. Regardless of which method is used, the obtained bezigons serve as initial parameters in the next phase; hence, they are not necessary highly accurate. The technical details of this phase are outside of the scope of this paper.
Bezigon optimization phase. The optimization phase is the primary concern of this paper. This phase includes direct bezigon optimization, which is the task that we are emphasizing. This process takes the initial bezigons from the first phase as input and outputs the optimal bezigons as the final vector result. In contrast to other existing vectorization approaches, this phase of our framework consists of neither simply applying a curvefitting algorithm (e.g., [31]) nor indirectly optimizing bezigons according to an intermediate representation (e.g., polygons in [1]). Instead, we optimize the bezigon parameters by directly observing the raster input and incorporating both the imagetracing experience of experts and prior knowledge from existing handdrawn vector images. The bezigon optimization and these sources of information are simultaneously bridged by our data energy and prior energy. In this way, unnecessary accumulated errors introduced by the intermediate process can be avoided, and hence, the quality of the resultant bezigon can be improved. However, as stated previously, there are several as yetunresolved challenges arising from such an optimization approach. Therefore, our paper will emphasize these issues. Section 4 presents a discussion of our solution method and explains our main contributions.
There are three major advantages to our framework. First, the error arising from the entire vectorization pipeline can be minimized. Second, any bezigon based priors can be conveniently incorporated to generate even more reasonable results, once we have a better understanding of bezigons in typical vector images, or to cause the resultant bezigons to satisfy certain constraints of other specific applications. Third, our vectorization approach behaves similarly to bezigon evolution, which is particularly well suited to the vectorization of clipart animations, and facilitates the further refinement of inaccurate bezigons resulting from other vectorization approaches.
4 Approach for Directly Optimizing Bezigons
In this section, we solve some key issues related to bezigon optimization. The optimization involves specifying the data energy with the proper rasterization model (Section 4.1) and several bezigonspecific prior terms (Section 4.2). To more efficiently solve Equation 6, we also explore the nature of bezigon parameters and propose a piecewise optimization strategy (Section 4.3). In the following, we will use the same notations as are used in Section 3.
4.1 Data Energy
To fully utilize the information provided by the input image, we define the data energy as the distance between the input image and the image generated by rasterizing a vector solution :
(7)  
(8) 
Here, the function models a specific bezigon rasterization process. The function takes the parameters W of a bezigon as input and produces a raster image of the same size as the input image . and denote the values at pixel in the rasterized image given by and in the input image, respectively. is the lattice of the input image . The denominator represents the arc length of the initial bezigon. This denominator is fixed during the optimization and can be easily estimated from the geometric parameters of the initial bezigon, i.e.,
Two issues now arise for consideration. First, a bezigon rasterization function for should be specified because such a function is essential to make Equation 8 suitable for optimization. It is also one of the most challenging aspects of direct bezigon optimization. As previously mentioned, the most important contribution of the current stateoftheart approach [1] also lies in finding an appropriate rasterization function, but one that is specific to polygon optimization. For bezigon optimization, however, research concerning suitable rasterization functions is still lacking in the existing literature. Second, we must address the case in which the input image is not generated by the specified rasterization function used for the data energy because the rasterization method that generates the given input image is generally unknown and most likely not the same as our function.
Regarding the first issue, several methods exist for directly or indirectly rasterizing bezigons [32, 33, 34, 35, 36, 37, 38, 39], each of which corresponds to a candidate rasterization function . However, we find that nearly all such functions yield poor results when a typical solver for nonlinear optimization (such as conjugate gradient, lBFGS, or NEWUOA) is applied. This is because most available rasterization functions are either piecewise flat, or discontinuous, almost everywhere (as shown in Figure 2b). Although such discontinuities pose no problems for common rasterization tasks, they can strongly degrade the effectiveness or efficiency of optimization. Various specific optimization approaches (such as [40]) can be applied in the case of discontinuous functions. However, our experimental results indicate that such approaches often fail to produce satisfactory bezigons. Moreover, these solvers are relatively slow, which limits their use in image vectorization. Based on the above experiments and analysis, we recognize that an appropriate rasterization function should exhibit certain properties, such as continuity with respect to the bezigon parameters. Moreover, if the rasterization function is also differentiable with respect to those parameters, more efficient and effective solvers can be adopted to optimize our energy function to obtain better results.
In the search for proper rasterization approaches, the approach presented in [35] came to our attention. This approach uses a hierarchical Haar wavelet representation to analytically calculate an antialiased raster image of bezigons. According to [35], for a bezigon , the pixel color value at in the resultant raster image can be calculated as follows:
(9) 
Here, represents a specific scaling from the original resolution to the pixel solution , and , , represents a specific translation in the finite set corresponding to all possible translations in the current scaling. and are a two dimensional Haar wavelet basis function and its coefficient, respectively. The definitions of these two functions can be found in Appendix 7.1.
Although [35] provides a closedform solution for rasterizing bezigons, the continuity and differentiability of are not obvious because of the discontinuity of the Haar wavelet basis functions. One of the most important tasks of this section is to present the proofs of the continuity and differentiability of this rasterization function. The latter is not straightforward. To obtain the proof, we must rely on several properties and operations from the theory of generalized functions [7].
Note that for any given coordinate , is a function of the bezigon parameters . To establish the function’s continuity and differentiability, we present the following theorems.
Theorem 1 (continuity)
is a continuous function with respect to all bezigon parameters .
As previously stated, the bezigon parameters consist of the color parameters and the geometric parameters . According to Equation 9, is continuous as long as the assumed color model is continuous with respect to the color parameters , which is often the case. With respect to the geometric parameters , is also continuous. A detailed analysis can be found in Appendix 7.2.
Obviously, if serves as our rasterization function , then the resultant data energy is also a continuous function. The smooth curve that corresponds to our data energy in Figure 2b reflects such a property as well. The continuity of the data energy not only enables us to apply a common solver for the nonlinear optimization but also facilitates the resolution of any ambiguity that arises from the observation of the input data.
Theorem 2 (differentiability)
is differentiable with respect to the bezigon parameters .
Most color models are differentiable with respect to the color parameters . In such cases, is obviously differentiable with respect to the color parameters, according to Equation 9. However, the differentiability of with respect to the geometric parameters is not obvious. We use the theory of generalized functions to analyze this matter. Because of space limitations and the complexity of the discussion, the proof and the derivatives are presented in Appendix 7.3.
Based on the above analysis and theorems, we can conclude that may be a suitable candidate for the rasterization function in Equation 8. Therefore, this rasterization function may be adopted in the proposed framework. Then, our final data energy can be rewritten as
(10) 
Now, we consider the second issue. Because there are many commonly used rasterization methods, it is often the case that the input raster image is not generated by the rasterization method used in our data energy term. This could be an issue if there are significant differences in the rasterization results between our chosen method and the method used to generate the input image. Therefore, to ensure the practical utility of the proposed vectorization method, we must investigate whether the selected rasterization function can closely approximate the rendering results of other commonly used rasterization approaches.
Fortunately, our selected rasterizer is still a suitable choice in this context. To prove this claim, we perform the following experiment: We collect a set of realworld vector images. All these vector images are rasterized by each of the commonly used antialiased rasterizers, using the recently proposed methods, and by . Note that the only possible differences in images produced by different rasterizers lie in pixels that intersect the bezigon boundary. To further clarify the comparison, we consider only the differences among such pixels in the resultant images. Histograms of these differences are presented in Figure 5. It is readily apparent that a large proportion of the “boundary” pixels that are rendered by any other rasterizer remain identical those produced by . Moreover, all distributions have means of zero and small variances. Therefore, the pixel values generated by our rasterization function can be safely assumed to be a good approximation to those generated by other commonly used rasterization methods, and hence, our rasterization function can still accurately model the original rasterization process of most clipart images.
In summary, we have proven the suitability of our bezigon rasterization function for optimization as well as its compatibility with various clipart raster input, and we have presented the definition of our data energy. Notably, for any other rasterization function that is a candidate for application to vectorize a certain type of image, a similar procedure should be followed to evaluate the suitability and compatibility of that function.
4.2 Prior Energy
After the data energy has been carefully selected, various simple cases (e.g., the vectorization of a simple bezigon in a highresolution raster image) can already be effectively addressed when there is adequate information implicit in the observed raster data. However, it is more often the case that the bezigons are relatively complex and that the information available in the raster input is inadequate. In such a case, profound uncertainty regarding the correct solution may remain if the data energy alone is considered. Therefore, the optimization may result in unreasonable bezigons that can be easily identified by the human eye.
Indeed, our intensive experiments provide evidence of such issues. More specifically, the failure cases of direct bezigon optimization using only data energy generally fall into four categories: (a) selfintersections, (b) false corners with small angle variations, (c) short handles, and (d) twisted sections (Figure 3).
All these bezigons are considered to be unreasonable because they are aesthetically unpleasing and, according to expert opinion, are unlikely to be drawn or traced by a professional illustrator. These types of bezigons are also rare in typical vector images. (Taking selfintersection as an example, we find that very few bezigons in vector images from the Open Clipart library [41] intersect with themselves. Most bezigons that exhibit selfintersection are believed to have been created by an amateur or automatically traced from a raster image.) The reason for the occurrence of such illegal bezigons is that their corresponding raster images are quite similar to the input images (compare Figures 6f, 8f, 9f and 10f with 6h, 8h, 9h and 10h, respectively), although their vector forms are significantly different from the groundtruth images (compare Figures 6b, 8b, 9b and 10b with 6d, 8d, 9d and 10d, respectively). This situation results in low data energy, especially when the resolution of the input image is relatively low.
Our prior energy is designed precisely to solve the abovementioned problems and to ensure that the resultant bezigons are reasonable and aesthetically pleasing. More specifically, we construct a prior functional to reduce the likelihood of each type of failure cases. Thus, our prior energy has the following form:
(11) 
where , , and represent the selfintersection prior term (SPT), the anglevariation prior term (APT), the Bézierhandle prior term (HPT) and the curvelength prior term (LPT), respectively, and , , and are their respective weights. Each of the prior terms is specifically defined and explained in the following subsections.
Elimination of Selfintersection
Certain approaches are seemingly capable of avoiding selfintersection but are not feasible in practice. One intuitive method is to enforce a set of highly coupled nonlinear inequality constraints and use a primaldual interior point method [42] for optimization. However, this approach is not suitable in our case because of its computational complexity. As another naïve method, we could assign a large constant energy to a bezigon that is detected as exhibiting selfintersection. However, this provides almost no guidance for a bezigon that has already manifested selfintersection during optimization.
Instead, we attempt to analytically measure the extent of selfintersection and provide an effective regularization to automatically avoid bezigons with selfintersection. The primary advantage of our method is that it not only is capable of preventing selfintersection but also provides effective guidance to eliminate selfintersection that has already occurred. Moreover, it does not require expensive computation.
The procedure is illustrated in Figure 7. We first estimate all intersection points (indicated by red dots), if any. Each such point divides the bezigon outline into two parts. We consider the shorter of these parts (shown as red curves) and measure the extent of selfintersection by summing over their lengths. More formally, the measurement can be written as
(12) 
Here, is the set of partitions corresponding to all intersection points (red dots in Figure 7), and represents the arc length along the curve from to , i.e.,
(13) 
where and .
The energy term penalizes significant selfintersection. The more severe an intersection is, the more closely the length of a shorter part approaches the length of a longer part, and hence, the larger will be. When there is no selfintersection, is equal to zero. Our experiments demonstrate that optimization using the SPT results in bezigons that contain very little selfintersection and are likely to be close to the groundtruth image in terms of topology (see Figure 6c).
Regularization for Angle Variations
Although a simple curvesmoothing algorithm may remove small angle variations, such a method will most likely fail to preserve other visually significant corners. Moreover, it may not always be possible to identify the saliency of the corners using a fixed threshold for angle variations. Consequently, we must develop a more sophisticated method of smoothing out insignificant corners while preserving the small number of significant corners.
For this purpose, we penalize the sum of all angle variations. As a result, the optimized bezigon will consist of predominantly zeroangle variations and a small number of nonzero angle variations. This is important because it incorporates corner detection into the bezigon optimization.
More formally, we denote the two tangent vectors of the th endpoint by and . Then, the APT can be written as follows:
(14) 
The experimental results demonstrate that optimization with the APT can retain the smoothness of the bezigon while preserving visually significant corners (Figure 8c).
Avoidance of Short Bézier Handles
To guard against the possibility of short handles, we penalize any short handle using an inverse barrier function
(15) 
This term imposes a large penalty on short handles because tends toward infinity as any handle length tends toward . When the length of each handle is sufficiently large, this energy will be very small and hence will not engender serious side effects.
It can be experimentally demonstrated that optimization using the HPT considerably reduces the occurrence of unnatural bezigons, as illustrated in Figure 9b. Although the resultant handle may sometimes be slightly longer than it should be (compare the locations of the control points in Figure 9c with those in Figure 9d), such a result only affects the quality or editability of the vector results in general.
CurveLength Prior
Based on the experience of experts who are proficient in image tracing, a traced curve tends to be stretched as much as possible unless there is strong evidence that the curve should shrink or twist. This prior knowledge can be used to eliminate invalid bezigons of this type because the occurrence of a highly twisted curve without strong evidence in support of such twisting from the raster input can be easily identified.
Based on such prior knowledge, we penalize the curve length to avoid invalid bezigons of this type. Thus, the LPT can be defined as follows:
(16) 
4.3 Piecewise Bezigon Optimization
Once we have obtained the energy function developed in the previous sections, in many cases, this function can be minimized using a general nonlinear optimization method. However, because a typical bezigon often consists of a large number of parameters to be optimized and because the valid range for each parameter is large, the efficiency and even the convergence of the optimization process might be an issue. However, bezigon parameters possess a strong local control property. We may reduce the number of redundant calculations by fully utilizing this property.
In this section, we explore the nature of bezigon parameters and propose a piecewise optimization strategy that allows our highdimensional problem to be decomposed into several subcomponents that may be individually solved.
The fundamental concept of piecewise optimization is to optimize only a subset of the geometric parameters of each bezigon at any given time. This task is feasible because the effect of varying any given control point is limited to a local region of the bezigon.
More specifically, we regard two consecutive Bézier curve sections as one curve piece. Therefore, a bezigon with curve sections also consists of overlapped curve pieces. All curve pieces will be successively optimized. When optimizing a curve piece, we fix the first and last endpoints of the curve piece and determine the optimal solution for the four intermediate control points and the middle endpoint. We first optimize the five active control points (the red points in Figure 11a) of one curve piece and subsequently optimize the corresponding points (the red points in Figure 11b) of the next curve piece. It should be noted that the two consecutive pieces overlap and that two of the intermediate control points (e.g., those shown in red in both Figure 11a and Figure 11b) are shared. Therefore, all intermediate control points will be optimized twice in individual iterations. The process iteratively progresses from the first curve piece to the last.
Formally, we represent the geometric parameters of the th piece to be optimized as follows:
(17) 
All remaining geometric parameters of the bezigon are held fixed during the present optimization. Therefore, optimizing this curve piece amounts to identifying the optimal configuration that minimizes a function composed of the local energies, i.e.,
(18) 
Here, is the space of all possible geometric parameters for the th piece.
Such a strategy substantially increases the efficiency of the entire optimization. Although the objective function 18 is quite similar to Equation 6, the solution space is much smaller than . Therefore, the original highdimensional problem can be decomposed into a set of lowerdimensional problems, which greatly improves the efficiency of the overall optimization process. Moreover, all prior energy terms, except the SPT, are simply the sum of the corresponding energy of each curve section. Therefore, we can consider only two related sections when calculating these terms. In this manner, a large number of redundant computations can be eliminated.
Piecewise optimization not only is fast but also provides satisfactory bezigons with almost no decrease in accuracy. The experimental results indicate that after all curve pieces are traversed two or three times, in most cases, the resultant bezigon is nearly perfect.
As an optional step, we can jointly optimize all geometric parameters once more to further improve the result. Because our piecewise procedure can provide substantially more accurate input for further optimization, this subsequent global optimization can be significantly more efficient than it would be without piecewise optimization. The entire process of bezigon optimization is summarized in Algorithm 1.
The overlapped piecewise optimization strategy provides a fast yet accurate method of bezigon optimization, which is an important prerequisite for the practical application of our vectorization approach.
5 Experiments
To demonstrate the effectiveness of the approach developed in this paper, in
this section, we quantitatively and qualitatively compare our method with other
vectorization methods. As stated in Section 2, many
vectorization algorithms and software packages exist. However, most academic
work on vectorization is not relevant for comparisons because it is primarily
focused on photographs or other types of vectorization. We restrict our
comparisons to two approaches that are specialized for the vectorization of
clipart images. One method is Vector Magic [5], which was developed
on the basis of the stateoftheart method proposed by
[1]
5.1 Implementation
To evaluate the effectiveness of the proposed bezigon optimization method, we have implemented a prototype image vectorization system.
The core of our system is bezigon optimization. Because of the continuity and
differentiability of our energy function, either a curve piece or a global
bezigon can be effectively optimized using many available optimization
algorithms (such as NEWUOA [43],
lBFGS [44], and the conjugate gradient
method [45]). Note that there are four tuning parameters
in our objective function, namely, the weights of the four prior terms.
Empirically, we set (to strongly penalize
selfintersection), , and
. Although there may be other weight settings that would
yield better performance, we did not perform a thorough search for the optimal
weights. According to our experimental results, the proposed method is
generally insensitive to these parameters. Our preset weights should yield
satisfactory results. However, as the quality of the raster input decreases
(e.g., lowresolution input), fine tuning may become necessary
to generate perfect bezigons. Although our framework does not intrinsically
rely on any assumption regarding the color model, for simplicity of
implementation and convenience of fair comparisons with the most commonly used
clipart image vectorization methods, our prototype system currently assumes
that the color in each bezigon is uniform
The initial bezigons can be either extracted from the input image or obtained using other vectorization methods. They are not required to be highly accurate. Most initial bezigons in our experiments are far from perfect. Of course, if the initial pose drifts too far from the optimal pose (e.g., approaches random bezigons), our bezigon optimization may become trapped in a local optimum and output imperfect results. However, in practice, this rarely occurs because it is not very difficult to estimate a bezigon that is sufficient to serve as an initial solution. The real difficulties lie in the subsequent optimization, i.e., achieving bezigons of even higher precision, which is the key issue addressed in this paper.
Note that because this prototype system was primarily developed as a proof of concept, the speed of the process is not a priority at the moment. Our implementation code is currently written in Python, a dynamically typed and interpreted language. The code is run on a laptop with an Intel Core i52410M @ 2.53 GHz processor with 4 GB of memory. The total execution time varies (10 secs to 10 mins) as a function of the complexity of the shapes to be vectorized. It should be much faster when implemented in a static language. Moreover, our method can be highly parallelized by virtue of the nature of wavelet rasterization, which may also considerably improve the efficiency.
5.2 Quantitative Comparisons
We use a fidelity metric to quantitatively compare our results with those of other methods. The fidelity metric generally provides a good indication of the characteristics that define a good vectorization algorithm. To further evaluate the proposed method based on human aesthetic judgment, we also present a user study.
For both comparisons, we collected a set of clipart images available in both raster and vector formats. All the raster images served as inputs to our algorithm and to the other vectorization methods. Some methods considered in the comparison require parameter tuning. To perform a fair comparison, the dominant parameters of these methods were adjusted until the number of bezigons produced as output were approximately equal to the number of bezigons in the groundtruth vector. Then, we compared the vector images resulting from the different methods with respect to fidelity and user satisfaction. The details of both comparisons are presented below.
Comparison via peak signaltonoise ratio measurement. The quality of a vectorization is often evaluated in terms of the PSNR (peak signaltonoise ratio) or RMSE (rootmeansquare error) [1, 4]. Before evaluation, both the resultant vector image and the groundtruth image were rasterized at a specific resolution.
Figure 12 presents the histograms of the the increases in the PSNR that were achieved by our method with respect to Vector Magic (Figure 12a) and Adobe Illustrator (Figure 12b). It is evident that our method consistently yields a higher PSNR compared with competing methods. More specifically, our results reveal an increase in the PSNR of 05 dB with respect to Vector Magic and an increase of 1020 dB with respect to Adobe Illustrator.
Comparison via a user study. We also performed a user study to obtain a further evaluation based on human aesthetic judgment. For this purpose, a pairwise comparison test was created. We prepared 120 pairs of vector results. Each pair consisted of a vector image generated by our method and a vector image generated by another method (either Vector Magic or Adobe Illustrator). We constructed a web interface to show each pair of vector images, including their control points, but with no creator vectorizer name attached. Several participants with graphic design backgrounds were then asked to determine whether one image was much better than, better than, almost the same as, a little worse than, or much worse than another image in comparison with the ground truth. The statistical results are presented in Figure 13. This figure indicates that our results were considered to be superior those of the current stateoftheart method (Vector Magic) in nearly 80% of the pairwise comparisons. Approximately one quarter of the images were deemed to be much better (Figure 13a). Compared with the representative commercial software (Adobe Illustrator), almost all of our results were considered to be better, and half of them were judged to be much better.
To summarize the quantitative comparisons, our approach is found to be superior to both the stateoftheart algorithm and the representative commercial tool in terms of both fidelity and user satisfaction.
5.3 Qualitative Comparisons
For qualitative comparison of our method with the other methods, we provide a few results (Figures 1416) obtained using our approach and the competing methods. Because of space limitations, we highlight only one local patch for each image (shown in the even rows of Figures 1416). From the comparison, we observe that our results, in general, are more faithful to the raster input and that the shapes of the resultant bezigons are more reasonable and visually pleasing. More specifically, our strengths lie in the following cases.
Case 1: smooth boundary with high curvature. In a typical clipart image, smooth boundaries with high curvature are often found in the round corners of a shape (Figure 14j and Figure 14t). Traditional methods typically use a chain of densely sampled points to represent such structures and subsequently fit Bézier curves to the chain. The problem with this approach is that reconstruction of curves from such an intermediate representation can be excessively ambiguous in such regions. Therefore, the resultant bezigons exhibit false corners (see the redundant corners in Figures 14g, 14h, and 14r). Instead, our method directly infers the bezigons from the raster input to reduce such ambiguities to the greatest possible extent. Consequently, our bezigons contain fewer false sharp corners (Figure 14i and 14s).
Case 2: obtuse corners. Many shapes to be vectorized contain obtuse corners (Figures 15j and 15t). Preserving such corners is very important even when the result is visually satisfactory because it can be difficult to subsequently edit the vectorized shapes. However, for the same reason discussed in Case 1, traditional methods tend to smooth out such corners or yield a curve endpoint with an incorrect location (e.g., the overly smoothed boundaries in Figures 15h, 15q and 15r). Our method benefits from the direct optimization of the bezigons and avoids errors introduced by fitting sampled points that cannot accurately indicate the correct location of an obtuse corner. Therefore, our results typically preserve more obtuse corners (Figures 15i and 15s).
Case 3: slightly bent edges. Vectorizing various detailed structures, such as slightly bent edges (Figures 16j and 16t), is also difficult for traditional methods. Because the error associated with the generation of an intermediate representation is unavoidable, small perturbations of the point chain are often considered to be noise rather than signal. Therefore, certain slightly bent edges in the resultant bezigons are straightened (Figures 16g, 16h and 16r). In our framework, we trust only the original raster input and any prior knowledge regarding the curves. Although not every type of structure can be preserved (e.g., large perturbations or zigzaglike structures may be suppressed based on a priori knowledge of typical vector images), small shapes and slightly bent edges are more likely to be preserved in our results (Figures 16i and 16s).
From the analysis presented above, we can conclude that our direct bezigon optimization for image vectorization produces more convincing vector results in most cases.
6 Conclusion and Future Work
We have presented a novel framework for clipart image vectorization. In contrast to other methods, the proposed approach optimizes bezigons by directly observing the raster input and incorporating bezigonbased priors to minimize the errors introduced by other intermediate procedures. Both quantitative and qualitative comparisons demonstrate that the quality of the bezigons generated by our approach is typically higher compared with those generated by the current stateoftheart method and by commonly used commercial software.
Of course, certain types of clipart images (e.g., noisy images or lowresolution images that contain complex structures) exist that are too ambiguous to be precisely vectorized by any automated approach, including our method (Figure 17 shows such cases). Perhaps the best way to address these images is to incorporate a small amount of user intervention. For this purpose, our system provides a friendly graphical interface for user refinement during the course of bezigon optimization. The evolving bezigons are presented in the interface. The user is allowed to modify the location of any control point by dragging the mouse cursor. Our system takes the modified bezigon as a new initial bezigon and performs the subsequent optimization.
As future research, we will resolve additional ambiguities by incorporating more prior knowledge regarding vector images for bezigon optimization. Because we directly optimize the bezigons, it is trivial to incorporate such prior information into our framework.
We also plan to develop a commercial software package based on the proposed method. To make the software as efficient as possible, we will optimize the code of the current implementation and consider parallelization of the proposed approach. Remarkably, many components of our framework, ranging from the wavelet rasterization to the optimization of local structures that are completely irrelevant to each other, can be highly parallelized.
Acknowledgment
This research project has been underway for nearly three years. We would like to thank all the participants, especially two professional graphical designers, Beck Yang and Fanco Ke, for their helpful suggestions and valuable comments.
7
In this appendix we will introduce the complete definition of the rasterization function , where are the parameter set of a bezigon, and give a proof to illustrate the continuity and differentiability of this function with respect to the geometrical parameters.
7.1 Basic Definitions
Before describing the rasterization function, we introduce some basic definitions that will be needed throughout this section.
Based on [35], uses a hierarchical Haar wavelet representation to analytically calculate an antialiased raster image of a bezigon. Haar wavelets, as is well known, are represented by its mother wavelet function
(19) 
and its scaling function
(20) 
Based on the above two functions, the 1D Haar basis with a scaling parameter and a translating parameter could be formally defined as
(21)  
(22) 
Now let , the 2D Haar basis defined as following will be used later:
(23)  
(24)  
(25)  
(26) 
7.2 Rasterization Function And Its Continuity
According to [35], the value of pixel in the raster image of a given 2D bezigon, indicated by the parameters , takes the form
(27) 
Here, correspond to the wavelet coefficients contributed by the th Bézier curve segment:
(28) 
The notations and are the same as Equation 2 and 3 in Section 3. Note that given the bezigon parameters , both and are functions of one variable , while both and are firstorder derivatives with respect to . For all and ,
(29)  
(30) 
It is obvious that both and are continuous with respect to the variable respectively. Also, if is a continuous function of any parameters of , both and are too. From Equation 2, it is easy to see that both and are continuous with respect to any parameters of and . Therefore, are also continuous with respect to . Thus the continuity of with respect to geometrical parameters is totally determined by above discussion and its formula 27. Such property is also reflected in Figure 2, where the data energy function using is continuous with respect to an arbitrary geometrical parameter.
7.3 Derivatives of with respect to geometrical parameters
We will show that is differentiable with respect to the geometrical parameters B, which verifies Theorem 2 in Section 4. Since the discontinuity of Haar function, the conclusion of Theorem 2 is not obvious. To achieve this goal, we will use the theory of generalized functions and generalized derivatives [7]. Following deductions are all in the sense of generalized function and generalized derivative.
We first express formally such derivatives as
(31) 
and
(32) 
for all , , and .
Then the remaining problem is to discuss the differentiability of Haar basis coefficients with respect to geometrical parameters, i.e., the existence of and for all , , , and .
Generalized Derivatives of Haar Basis Functions. It is well known that the generalized derivative of :
(33) 
Here is an impulse function satisfying:
(34) 
Here is an arbitrary continuous function. Note that when composed with a continuous function , holds the following property [7]:
(35) 
Here is the set of the real roots of . Similarly,
(36) 
Therefore, for all , ,
(37) 
Similarly, for all , ,
(38) 
Derivatives of Haar Basis Coefficients with Respect to Geometrical Parameters. We first calculate . According to the generalized functions theory [7], for all , , , and ,
(39) 
Since has nothing to do with the parameter according to Equation 2, we have
(40) 