Robot Calligraphy using Pseudospectral Optimal Control
in Conjunction with a Simulated Brush Model
Chinese calligraphy is a unique form of art that has great artistic value but is difficult to master. In this paper, we make robots write calligraphy. Learning methods could teach robots to write, but may not be able to generalize to new characters. As such, we formulate the calligraphy writing problem as a trajectory optimization problem, and propose a new virtual brush model for simulating the real dynamic writing process. Our optimization approach is taken from pseudospectral optimal control, where the proposed dynamic virtual brush model plays a key role in formulating the objective function to be optimized. We also propose a stroke-level optimization to achieve better performance compared to the character-level optimization proposed in previous work. Our methodology shows good performance in drawing aesthetically pleasing characters.
Motion Control of Manipulators, Optimization and Optimal Control, Manipulation Planning
Making robots write beautiful calligraphy would be an exceptional feat since learning and mastering this art form takes humans years of practice. Chinese characters are complex and a calligraphic brush is difficult to manipulate properly. In this paper we aim to make a robot write Chinese characters by using a simulated brush model and pseudospectral optimal control methods to optimize for a trajectory.
Most relevant research on making actual robots create art adopts either a learning-based method or a trajectory optimization-based approach. The former often comes down to teaching by demonstration [Sun14iros_robot-learn-from-demo, Kotani19icra_TeachingRT], or self-correction [Mueller13iros_robotic_calligraphy]. By using learning one can skip the difficulty of modeling the behavior of a real calligraphy brush. However, learning methods also have a large training cost and may not generalize well to unseen characters. On the other hand, trajectory optimization-based methods do not face these problems. Here we simulate the writing behavior of an actual brush, and then search for an optimal trajectory for the robot to execute [Kwok06icec_Brush_stroke_generation, Lam09iros_Stroke_Trajectory]). However, most simulated brush models [Kwok06icase_robot_drawing, Lam09iros_Stroke_Trajectory] do not account for the complex ways a brush deforms during the writing process. Being able to capture this complexity has an important influence on the final performance.
We propose a trajectory optimization-based method based on principles from pseudospectral optimal control [Elnagar98coa_Chebyshev, Fahroo00acc_DirectTO], and also introduce a new dynamic virtual brush model to achieve fully automatic writing of Chinese characters, given a desired character unicode. Pseudospectral methods are generally used for optimizing continuous trajectories and controls, but we assume the control is realized by the low-level inverse kinematics solvers on the robot. Different from previous work which optimizes the trajectory for the whole character at once [Kwok06icec_Brush_stroke_generation, Lam09iros_Stroke_Trajectory], we decompose the character into strokes and perform stroke-based optimization. Full character-based optimization can be computationally expensive and get stuck in local minima. We extract strokes and create initial trajectory estimates by leveraging the properties of vector-based character databases. The proposed virtual brush concentrates on the dynamic mechanisms of an actual calligraphic brush but has a simpler structure compared to previous work [Xu09book_Calligraphy, Chu04icga_realtime_virtualBrush]. As a baseline, we compare with a virtual brush model similar to Kwok et al.[Kwok06icec_Brush_stroke_generation].
The primary contributions of this paper are:
We use pseudospectral methods to search for optimal writing trajectories to apply to calligraphy robots. Pseudospectral methods have natural modeling abilities for continuous trajectory optimization.
We design a new virtual brush model. Such a model is also able to simulate the real brush dynamics with higher accuracy, which leads to better optimized trajectories.
We exploit vector-based character libraries for easy stroke extraction and initialization for the stroke-level trajectory optimization.
Ii Related Work
This section concentrates on calligraphy robots, even though there are many other art forms that incorporate robotics [Kudoh07iros_painter-robot, Lu09icaim_PreliminarySO, Scalera18jirs_watercolor], sculpture[Niu07robio_Robot3S], graffiti [Jun16iros_Humanoid_Graffiti], etc. Most algorithms on calligraphy robots using a brush pen can be categorized as learning-based methods or trajectory optimization-based methods.
Ii-a Calligraphy robots using learning-based methods
Sun et al. [Sun13robio_callibot] [Sun14iros_robot-learn-from-demo] propose to learn from demonstration. They invite calligraphers to write characters while holding the robot arm and record the robot joint positions to establish a mapping model for robot control. Mueller et al. [Mueller13iros_robotic_calligraphy] propose an iterative learning method by trial-and-learn. Some more advanced learning algorithms such as RNN [Sasaki16ras_visual_motor], generative adversarial networks [Chao18icra_calligraphy_gan], deep reinforcement learning [Wu18robio_DRL_calligraphy], local and global learning models [Kotani19icra_TeachingRT], are also explored. These methods usually require many iterations of training to achieve good performance, which is inconvenient. Generalizing to new and complicated characters is difficult.
Ii-B Virtual brush models
Virtual brush models are mainly used in trajectory optimization-based algorithms. We can divide virtual brush models into two categories: physics-based models and data-driven models.
Physics-based virtual brush models strive to simulate the physical dynamics of a real brush from experimental observation [Xu02cgf_hairy_brush, Xu03_CGF_virtual_brush, Saito93book_DPB, Lee99book_BSO] or physical laws [Chu02pccga_3dpainting, Chu04icga_realtime_virtualBrush]. Strassmann [Strassmann86siggraph_hairy-brush] proposes an initial design featuring four basic parameters of a hairy brush. Wong et al. [Wong00cg_virtual_brush] propose to use a cone to represent the bundle of the brush and use the cross-section of the cone, an ellipse, to represent the footprint. Xu et al. [Xu09book_Calligraphy] propose a virtual brush model with much detail and complex mechanisms obtained from approximations and assumptions. However, obtaining and fitting good parameters to complicated virtual brush models mentioned above is difficult, as such, we propose a virtual brush with easy structure to fit and implement, even making real-time trajectory optimization possible.
Data-driven virtual brush models are created from measuring and recording actual brush footprints. Kwok et al. propose a very simple virtual brush which draws droplet shapes with its size proportional to the writing height [Kwok06icec_Brush_stroke_generation]. In their later work [Kwok06icase_robot_drawing], they use a camera placed below the writing plane to collect footprints during the writing process. Lam et al. [Lam09iros_Stroke_Trajectory] define their writing mark as a polygon connected by eight points and fit their position parameters with the collected footprints. Considering the big calculation cost, Baxter et al. [Baxter10book_SimpleDM] build a deformation table and makes the calculation process much faster with complicated simulation effect.
Ii-C Stroke extraction
Stroke extraction involves separating a character into its comprising strokes, and is difficult to do with good accuracy when analyzing just pixels of an image. There are three main categories of stroke extraction methods: skeleton-based [Liu01pr_ModelbasedSE, Zeng10is_cmrf_stroke], region-based [Cao00icpr_stroke_extraction], and contour-based [Lee98pr_stroke_extraction, Sun14icra_brush-stroke-extraction, Chao19_HMS_calli_style]. Most of these methods are complicated and cannot promise good accuracy between the extracted stroke and actual stroke. As such, we propose to use vector-based images as our character dataset, which, in contrast, provides a quick and accurate way to extract strokes.
Ii-D Optimization methods
As mentioned above, Kwok et al. [Kwok06icec_Brush_stroke_generation, Kwok06icase_robot_drawing] propose to use Bezier curves to represent stroke trajectories and find the optimized trajectory by minimizing differences between simulated images and desired character images. But their algorithms have not yet been applied to actual robots. Furthermore, their algorithms perform character-level optimization, which is computationally expensive and cannot handle complicated characters without needing human intervention. Lam et al. [Lam09iros_Stroke_Trajectory] propose to minimize the width difference of the strokes between reference images and a simulated image written by the virtual brush as it moves along the middle axis of each stroke. However, their method is sensitive to small variations in stroke images, and so the results suffer from a loss of smoothness.
Iii Methodology Overview
An overview of our methodology is introduced by Fig. 2 on the next page. In terms of notation, we use to denote the open-loop control trajectory given to the real brush. Given this trajectory, the end-effector is assumed to be able to track it with high accuracy.
Iv Virtual Brush
There are two virtual brush models developed in this paper for simulating the writing process, a simple virtual brush and a dynamic virtual brush. Both generate good results, but the dynamic virtual brush is more sophisticated and better at closing the “sim2real” gap. The simple virtual brush is similar to Kwok et al. [Kwok06icec_Brush_stroke_generation]’s work and is mainly used as a baseline.
Iv-a Simple virtual brush model
Given the height , of the brush, the simple virtual brush simulates drawing a circle with radius proportional to the height . One important feature that distinguishes our simple virtual brush from others is that our simple brush establishes a continuous mapping from the trajectory to a continuously-valued image rather than a discrete one, so that gradient information is not lost. The circle drawn by our simple virtual brush has a radial distribution following a sigmoid function:
where is the pixel value at the coordinates of the generated image, is a coefficient chosen to restrict the output pixel range.
Iv-B Dynamic virtual brush model
The dynamic virtual brush has two parts: a part that draws, and a part that updates the parameters of the brush. The drawing part describes the brush leaving a mark on paper depending on its parameters. The updating part is when the brush parameters are updated due to deformations when executing the control commands .
Iv-B1 Drawing Part
The dynamic virtual brush model has four state parameters describing its internal behavior: width , drag , offset , and orientation . As shown in Fig. 3, width and drag define the size of a brush mark, offset simulates the deviation of the brush mark from the center of the vertical brush handle (this is due to the bend of the brush hairs when applying pressure), and orientation describes the direction of the writing mark (0 degrees means it points toward the right of the paper). The shape of the brush mark is defined by a quadratic curve. Some special brush characteristics like hair-splitting are generally avoided in real calligraphy, so we do not incorporate this.
Iv-B2 Updating Brush Parameters Part
The updating process is based on the assumption that given one control command, , we can update the four parameters of the brush. In our experiments, we fit the relationships between the parameters, width , drag , and offset , and by collecting brush footprint data at varying writing heights. However, orientation , is dependent on the direction the brush moves in. We observed that moving the brush in one direction will cause the tip of the brush to gradually change towards the opposite direction.
Fitting the parameters to brush height We measure the width, drag, and offset of writing marks left by a real calligraphy brush, then fit a linear relationship between the parameters, width , drag , and offset , and the change in height of the brush, .
Adding inertia to parameters We introduce inertia to parameter updating because in reality, the brush parameters are not updated fully or instantaneously; brush deformations happen gradually and steadily. Given two adjacent control commands and , the parameters are updated while taking inertia into account:
where Width, Drag, and Offset respectively calculate a new width, drag, and offset, given control command , and is the inertia value. The inertia value is inversely proportional to the distance between two adjacent points or it loses effect if we use densely sampled points.
V Trajectory Optimization
V-a Review of Pseudospectral Optimal Control
The calligraphy problem in this paper is formulated as a trajectory optimization problem, and in particular we adopt some of the machinery from pseudospectral optimal control methods. As such, we briefly review pseudospectral optimal control (PSOC) methods in this section, following the exposition from Fahroo et al. [Fahroo00acc_DirectTO].
A simplified version of an optimal control problem is formulated as follows. Given a cost function and a model of the system dynamics
the objective is to find the optimal control sequence , that minimizes the cost function . Above represents the system’s state trajectory.
The basic idea in PSOC is to approximate the control trajectory and the state trajectory by a polynomial curve with unknown parameters, thereby transform the original problem into a nonlinear programming problem. To this end, pseudospectral methods choose a specific set of points from the curve for interpolation. For example, in the case of Chebyshev pseudospectral methods, the interpolation points are given as the Chebyshev-Gauss-Lobatto (CGL) points [Furgale12icra_ContinuoustimeBE]:
To recover the state at any arbitrary time one can use barycentric interpolation, , i.e.,
We can now express the original dynamics equation with an approximation, where additionally the objective function Eq. 6 can be discretized if necessary. The original problem is thus transformed to minimize the cost with respect to the two coefficient vectors ,
representing the values of the state and controls, respectively, at the CGL points.
V-B PSOC for Calligraphy
Below we apply these methods to trajectory optimization for open-loop control trajectories of a robot end-effector, with the goal of faithfully reproducing Chinese characters. PSOC methods are generally used for collocated optimal control where the system dynamics are enforced through specialized components of the cost function. However, here we assume that the control is realized by the low-level inverse kinematics solvers on the robot.
The optimization for a character is decomposed into a series of trajectory optimization problems corresponding to the different strokes of the character. This greatly simplifies the process and is also more computationally efficient. In this we are helped by the existence of vector-based character databases in which characters are stored decomposed in their individual strokes.
V-C Stroke extraction
To obtain reference images for each stroke, as well as initialize the nonlinear optimization, we exploit the existence of vector-based character databases. Vector-based images are advantageous because they store individual strokes as Bezier curves, which are themselves continuous polynomial curves defined by a set of control points. Because each stroke is stored separately, it makes the stroke extraction trivial. In our case, the dataset we choose is a Scalable Vector Graphics (SVG) dataset from MakeMeHanzi [makeahanzi], and an example of extracted strokes can be seen in Fig. 4.
V-D Stroke trajectory representation
The stroke trajectories will be used as the open-loop control trajectories for the robot to draw strokes, and we represent them as three-dimensional trajectories of the end-effector. Each of the components are separately represented as a 1-dimensional Chebyshev polynomial curves. The trajectories with its three dimensions are expanded by interpolating the values at the Chebyshev-Gauss-Lobatto (CGL) points given by Eq. 8 and Eq. 10:
Hence, the decision variables are the combination of the three sets of CGL points in the dimensions:
V-E Optimization for the stroke trajectories
The objective function for the stroke optimization is to minimize the sum-squared pixel difference between a reference image for the stroke, and an image produced by simulating the drawing process:
Above is a function that represents the virtual brush simulation, taking a trajectory and drawing an image according to the given trajectory, parameterized by the pseudospectral values for stroke . Also given is the initial brush state when the brush starts writing the stroke. The reference image of the stroke, , is converted to an image representation from the vector-based stroke. A much more dense even sampling is performed in to draw a stroke looks continuous. is the last direction parameter after writing the previous stroke, which is assumed to be the initial direction parameter for the current stroke. It is not included for the simple virtual brush. A scale parameter in is used to convert the to the same magnitude as and for the convenience of optimization.
Many nonlinear optimization methods can be used to minimize Eq. 6. In our case, we use the Levenberg-Marquardt (LM) algorithm, which is a second-order trust-region method, which switches between a gradient-based search and a second-order Gauss-Newton update:
The Jacobian matrix is calculated using numerical differentiation at each iteration. We utilize the GTSAM library [Dellaert12book_FactorGA, Dellaert17FactorGF] to perform the optimization. GTSAM was originally created to solve simultaneous localization and mapping problems, but has been used in many different contexts since, including motion planning [Mukadam18ijrr_gpmp, Mukadam18ar_steap].
V-F Trajectory Initialization Estimates
We propose to use the skeleton of a stroke as our initialization for the and coordinates of our trajectory, while the initial coordinates are set to a fixed value. From our observations, when people write calligraphy, they generally make the brush approximately follow the skeleton of the stroke while varying the height of the brush. Because we start from a vector-based representation, extracting an initial trajectory for the individual strokes is much simplified. Even when starting from images, there are many good image-based skeleton extraction algorithms, e.g. the Chordal Axis Transform [Zhang96icpr_thinning, Lo06iros_BrushFA], an example result of which is shown in Fig. 4(a). In our case, we use the “animation path” provided by the dataset as the initial estimation for the 2D and sequence for simplicity. CGL points are sampled on each skeleton path to obtain the pseudospectral representation. Generating an easy estimate for is not intuitive, and so we just initialize it with a constant sequence.
Given this initial trajectory, we can simulate the image formation process using both the simple virtual brush and the dynamic virtual brush, as illustrated in Fig. 4(b) and Fig. 4(c) respectively. The written mark from the dynamic virtual brush is different from its given trajectory, which is ignored by most previous research.
The dynamic virtual brush model yields better written results than the simple virtual brush model, although the latter can produce very high-quality simulated images. Fig. 6 shows a comparison between the dynamic and the simple virtual brush model. In panel (b), the simulated image produced by the simple model is very good. However, because the deformation of the real brush is not modeled, the results on the real robot are of lower quality.
Hence, although minimizing cost between the simulation images and the original character images is used as the objective function, generating too small of a simulation error will lead to over-fitting, and this is true for both models. In other words, the capacity of the virtual brush model to model the actual brush sets the performance limit of the project, and using optimization methods to surpass the limit may lead to bad results. From our experience, an average pixel error between and of all pixels is usually enough to generate good written results.
Another possible cause of over-fitting is the degree of the polynomial parameterization used to represent the character strokes. Currently we choose a degree for every character once the simulation error of Eq. 14 falls into the range . But such a method does not promise to find the global optimum, but could provide a more practical and robust solution for the robot to execute. In the future, we can tune the degree to better fit each stroke.
In Fig. 7, we present the results of our approach, including photographs of characters that have been drawn by a Fetch robot in our lab. We show results for four different Chinese characters. Both the simulation and written images before and after optimization are shown for easy comparison. From the figure we can see that the optimization achieves good performance for simulated images. However, because of a “sim2real” gap in the virtual brush model, the proposed method still has a ways to go in terms of approaching the smoothness and definition of detail displayed in the reference images, rendered from the vector-based character database.
Vii Discussion and Future Work
In this paper, we presented a trajectory optimization method to make robots write calligraphy, searching for open-loop control trajectories by minimizing a simulation against a reference image. The proposed dynamic virtual brush simulation yields good results in simulation and in turn produces reasonable open-loop control trajectories for a real robot. However, from the results it is clear that the sim2real gap has not been fully closed, and it is an open question whether a closed loop strategy will every yield master-level calligraphy. Hence, in future work we plan to push on both better brush models and the possibility of feedback control around optimized trajectories.