Deep Parametric Shape Predictions using Distance Fields
Abstract
Many tasks in graphics and vision demand machinery for converting shapes into representations with sparse sets of parameters; these representations facilitate rendering, editing, and storage. When the source data is noisy or ambiguous, however, artists and engineers often manually construct such representations, a tedious and potentially timeconsuming process. While advances in deep learning have been successfully applied to noisy geometric data, the task of generating parametric shapes has so far been difficult for these methods. Hence, we propose a new framework for predicting parametric shape primitives using deep learning. We use distance fields to transition between shape parameters like control points and input data on a raster grid. We demonstrate efficacy on 2D and 3D tasks, including font vectorization and surface abstraction.
1 Introduction
The creation, modification, and rendering of vector graphics and parametric shapes is a fundamental problem of interest to engineers, artists, animators, and designers. Such representations offer distinct advantages over other models. By expressing a shape as a collection of primitives, we are able to apply transformations easily, to identify correspondences, and to render at arbitrary resolution, all while having to store only a sparse representation.
It is often useful to generate a parametric model from data that does not directly correspond to the target geometry and contains imperfections or missing parts. This can be an artifact of noise, corruption, or humangenerated input; often, an artist intends to create a precise geometric object but produces one that is “sketchy” and ambiguous. Hence, we turn to machine learning methods, which have shown success in inferring structure from noisy data.
Convolutional neural networks (CNNs) achieve stateoftheart results in vision tasks such as image classification [23], segmentation [27], and imagetoimage translation [20]. CNNs, however, operate on raster representations. Grid structure is fundamentally built into convolution as a mechanism for information to travel between layers of a deep network. This structure is leveraged during training to optimize performance on a GPU. Recent deep learning pipelines that output vector shape primitives [40] have been significantly less successful than pipelines for analogous tasks on raster images or voxelized volumes.
A challenge when applying deep learning to parametric geometry is the combination of Eulerian and Lagrangian representations. CNNs process data in an Eulerian fashion in that they apply fixed operations to a dense grid of values; Eulerian shape representations like indicator functions come as values on a fixed grid. Parametric shapes, on the other hand, use sparse sets of parameters like control points to express geometry. In contrast to stationary Eulerian grid points, this Lagrangian representation moves with the shape. Navigating the transition from Eulerian to Lagrangian geometry is a key step in any learning pipeline for the problems above, a task we consider in detail.
We propose a deep learning framework for predicting parametric shapes, addressing the aforementioned issues. By analytically computing a distance field to the primitives at each training iteration, we formulate an Eulerian version of the Chamfer distance, a common metric for geometric similarity. Our metric can be computed efficiently and does not require sampling points from the predicted or target shapes. Beyond accelerating evaluation of existing loss functions, our distance field enables alternative loss functions that are sensitive to specific geometric qualities like alignment.
We apply our new framework in the 2D context to a diverse dataset of fonts. We train a network that takes in a raster image of a glyph and outputs a representation as a collection of Bézier curves. This maps glyphs onto a common set of parameters that can be traversed intuitively. We use this embedding for font exploration and retrieval, correspondence, and unsupervised interpolation.
We also show that our approach works in 3D. With surface primitives in place of curves, we perform volumetric abstraction on ShapeNet [8], inputting an image or a distance field and outputting parametric primitives that approximate the model. This output can be rendered at any resolution or converted to a mesh; it also can be used for segmentation.
Contributions. We present a technique for predicting parametric shapes from 2D and 3D raster data, including:

a general distance field loss function allowing definition of several losses based on a common formulation;

application to 2D font glyph vectorization, with application to correspondence, exploration, retrieval, and repair;

application to 3D surface abstraction, with results for different primitives and constructive solid geometry (CSG) as well as application to segmentation.
2 Related Work
Font exploration and manipulation.
Designing or even finding a font can be tedious using generic vector graphics tools. Certain geometric features distinguish letters from one another across fonts, while others distinguish fonts from one another. Due to these difficulties and the presence of large font datasets, font exploration, design, and retrieval have emerged as challenging problems in graphics and learning.
Previous exploration methods categorize and organize fonts via crowdsourced attributes [29] or embed fonts on a manifold using purely geometric features [7, 3]. Instead, we leverage deep vectorization to automatically generate a sparse representation for each glyph. This enables exploration on the basis of general shape rather than fine detail.
Automatic font generation methods usually fall into two categories. Rulebased methods [39, 32] use engineered decomposition and reassembly of glyphs into parts. Deep learning approaches [1, 42] produce raster images, with limited resolution and potential for imagebased artifacts, making them unfit for use as glyphs. We apply our method to edit existing fonts while retaining vector structure and demonstrate vectorization of glyphs from partial and noisy data, \ie, raster images from a generative model.
Parametric shape collections.
As the number of publiclyavailable 3D models grows, methods for organizing, classifying, and exploring models become crucial. Many approaches decompose models into modular parametric components, commonly relying on prespecified templates or labeled collections of specific parts [21, 36, 30]. Such shape collections prove useful in domainspecific applications in design and manufacturing [34, 41]. Our deep learning pipeline allows generation of parametric shapes to perform these tasks. It works quickly on new inputs at test time and is generic, handling a variety of modalities without supervision and producing different output types.
Deep 3D reconstruction.
Combining multiple viewpoints to reconstruct 3D geometry is crucial in applications like robotics and autonomous driving [15, 35, 38]. Even more challenging is inference of 3D structure from one input. Recent deep networks can produce point clouds or voxel occupancy grids given a single image [14, 10], but their output suffers from fixed resolution.
Learning signed distance fields defined on a voxel grid [12, 37] or directly [31] allows highresolution rendering but requires surface extraction; this representation is neither sparse nor modular. Liao \etaladdress the rendering issue by incorporating marching cubes into a differentiable pipeline, but the lack of sparsity remains problematic, and predicted shapes are still on a voxel grid [25]
Parametric 3D shapes offer a sparse, nonvoxelized solution. Methods for converting point clouds to geometric primitives achieve highquality results but require supervision, either relying on existing data labelled with primitives [24, 28] or prescribed templates [16]. Tulsiani \etaloutput cuboids but are limited in output type [40]. Groueix \etaloutput primitives at any resolution, but their primitives are not naturally parameterized or sparsely represented [17].
3 Preliminaries
Let be two smooth (measurable) shapes. Let and be two point sets sampled uniformly from and . The directed Chamfer distance between and is
(1) 
and the symmetric Chamfer distance is defined as
(2) 
It was proposed for computational applications in [6] and has been used as a loss function assessing similarity of a learned shape to ground truth in deep learning [40, 14, 26, 17].
We also define variational directed Chamfer distance
(3) 
with variational symmetric Chamfer distance defined analogously, extending (1) and (2) to smooth objects. We use this to relate our proposed loss to Chamfer distance.
If points are sampled uniformly, under relatively weak assumptions about and , iff , as the sizes of the sampled point sets grow. Thus, it is a reasonable shape matching metric. Chamfer distance, however, has fundamental drawbacks:

It is highly dependent on the sampled points and sensitive to nonuniform sampling, as in Figure 0(a).

It is slow to compute. For each sampled from , it is necessary to find the closest sampled from , a quadratictime operation when implemented naïvely. Efficient structures like d trees are not wellsuited to GPUs.

It is agnostic to normal alignment. As in Figure 0(b), the Chamfer distance between a dense set of vertical lines and a dense set of horizontal lines approaches zero.
Our method does not suffer from these disadvantages.
4 Method
Beyond architectures for particular tasks, we introduce a framework for formulating loss functions suitable for learning placement of parametric shapes in 2D and 3D; our formulation not only encapsulates Chamfer distance—and suggests a means of accelerating its computation—but also leads to stronger loss functions that improve performance on a variety of tasks. We start by defining a general loss on distance fields and propose three specific losses.
4.1 General Distance Field Loss
Given , let measure distance from each point in to or , respectively, . In our experiments, Let be a bounded set with . We define the general distance field loss as
(4) 
for some measure of discrepancy . Note that we represent and only by their respective distance functions, and the loss is computed over .
Let be a collection of parameters defining a shape. For instance, a parametric shape may consist of Bézier curves, in which case contains a list of control points. Let be the distance to the shape defined by . Given a target object with distance function , we formulate fitting a parametric shape to approximate w.r.t. as minimizing
(5) 
For optimal shape parameters, . We propose three discrepancy measures, providing loss functions that capture different geometric features.
4.2 Surface Loss
We define surface discrepancy to be
(6) 
where is the Dirac delta defined uniformly on , and denotes the zero levelset of . is only nonzero where the shapes do not match, making it sensitive to fine details in the matching:
The symmetric variational Chamfer distance between equals the corresponding surface loss between, \ie, .
Unlike the Chamfer distance, the discrete version of our surface loss can be approximated efficiently on GPU without sampling points from either the parametric or target shape.
4.3 Global Loss
We define global discrepancy to be
(7) 
Minimizing is equivalent to minimizing the distance between and . increases quadratically in distance between the shape defined by and the target shape. Thus, minimizing encourages global alignment, quickly placing the parametric primitives close to the target and accelerating convergence.
4.4 Normal Alignment Loss
We define normal alignment discrepancy to be
(8) 
Minimizing aligns normals of the predicted primitives to those of the target. Following Figure 0(b), if contains dense vertical lines and contains horizontal lines, is large while .
4.5 Final Loss Function
The general distance field loss and the specific discrepancy measures proposed thus far are differentiable w.r.t the shape parameters , as long as the distance function is differentible w.r.t. . Thus, they are wellsuited to be optimized by a deep network that predicts parametric shapes. To discretize, we simply redefine (4) to be
(9) 
where is a 2D or 3D grid. Thus, we minimize a weighted sum of across the defined in §4:
(10) 
We use and across experiments. In 3D experiments, we decay the global loss term exponentially by a factor 0.5 every 500 iterations.
4.6 Network architecture
The network takes a image or a distance field as input and outputs a parametric shape. We use the same architecture for 2D and 3D, following advances in network architecture design. Let c5s264 be a convolutional layer with 64 filters of evaluated at stride , Rx7 be 7 residual blocks [19, 18] of size (keeping filter count constant), and fc512 be a fullyconnected layer. To increase receptive field without dramatically increasing parameter count, we also use dilated convolution [43, 9] in residual blocks. We use ELU [11] after all linear layers except the last. We use LayerNorm [2] after each conv and residual layer, except the first. Our encoder architecture is: c5s132, c3s264, c3s164, c3s2128, c3s1128, c3s1128, c3s2256, Rx7, c3s2256, Rx1, c3s2256, Rx1, fc512, fcN, where N is the dimension of our target parameterization. Our pipeline is illustrated in Figure 2. We train each network on a single Tesla K80 GPU, using Adam [22] with learning rate and batch size 16.
5 2D: Font Exploration and Manipulation
We demonstrate our method in 2D for font glyph vectorization. Given a raster image of a glyph, our network outputs control points that form a collection of quadratic Bézier curves approximating its outline. When used on a glyph of a simple font (nondecorative, \eg, sansserif), our method recovers nearly the exact original vector representation. From a decorative glyph with finegrained detail, however, we recover a good approximation of the glyph’s shape using relatively few Bézier primitives and a consistent structure. This process can be interpreted as projection onto a common sparse latent space of control points.
We first describe our choice of primitives as well as the computation of their distance fields. We introduce a templatebased approach to allow our network to better handle multimodal data (different letters) and test several applications.
5.1 Approach
5.1.1 Primitives
We wish to use a 2D parametric shape primitive that is sparse and expressive and admits an analytic distance field. Our choice satisfying these requirements is the quadratic Bèzier curve, which we will refer to as curve, parameterized by control points and defined by for . We represent 2D shapes as the union of curves parameterized by , where .
Given a curve parameterized by and a point , the such that is the closest point on the curve to satisfies the following:
(11) 
where and .
Thus, evaluating the distance to a single curve requires finding the roots of a cubic [33], which we can do analytically in constant time. To compute distance to the union of the curves, we take a minimum:
(12) 
In addition to the control points, we predict a stroke thickness parameter for each curve. We use this parameter when computing the loss by “lifting” the predicted distance field and, consequently, thickening the curve—if a curve has stroke thickness , we set . While we do not visualize stroke thickness in our experiments, this approach allows the network to thicken curves to better match finegrained decorative details (Figure 4). This thickening is a simple and natural operation in our distance field representation; note that samplingbased methods do not provide a natural way to “thicken” the surfaces.
5.1.2 Templates
Our training procedure is unsupervised, as we do not have ground truth curve annotations. To better handle the multimodal nature of our data without a separate network for each letter, we label each training example with its letter, passed as additional input to our network. This allows us to condition based on input class by concatenating a 26dimensional onehot vector to the input of each convolutional layer (after replicating to the appropriate spatial dimensions), a common technique for conditioning [44].
We choose a “standard” Bèzier curve representation for each letter, which captures that letter’s distinctive geometric and topological features, by designing 26 templates from a shared set of control points output by our network. A template of class is a collection of points with corresponding connectivity determining how points in are used to define curves. Since we predict glyph boundaries, our final curves form closed loops, allowing us to reuse endpoints.
For extracting glyph boundaries from uppercase English letters, there are three connectivity types—one loop (\eg, C), two loops (\eg, A), and three loops (B). We design templates such that the first loop has 15 curves and the other loops have 4 curves each. Our templates are shown in Figure 5. We will show that while letter templates (a) are better able to specialize to the boundaries of each glyph, we still achieve good results for most letters with the simple templates (b), which also allow for establishing crossglyph correspondences.
We use predefined templates together with our labeling of each training example for two purposes. First, connectivity is used to compute curve control points from the network output. Second, they provide a template loss
(13) 
where , , and is the current iteration. This serves to initialize the network output, such that a training example of class initially maps to template letter ; as the loss decays during training, it acts as a regularizer. We choose , , and .
5.2 Experiments
We train our network on the 26 uppercase English letters extracted from nearly 10,000 fonts. The input is a raster image of a letter, and the target distance field to the boundary of the original vector representation is precomputed.
5.2.1 Vectorization
For any font glyph, our method generates a sparse vector representation, which robustly and accurately describes the glyph’s structure while ignoring decorative and noisy details. For simple fonts comprised of few strokes, our representation is a nearly perfect vectorization, as in Figure 2(a).
For glyphs from decorative fonts, our method produces a meaningful representation. In such cases, a true vectorization would contain many curves with a large number of connected components. Our network places the sparse curve template to best capture the glyph’s structure, as in Figure 2(b).
Our method preserves semantic correspondences in our templates. The same curve is consistently used for the boundary of, \eg, the top of an I. These correspondences persist across letters for both letter templates and simple templates—see for example the E and F glyphs in Figure 2(a) and 2(b) and “simple templates” in Figure 12.
We demonstrate robustness in Figure 8 by quantizing our loss values and visualizing the number of examples for each value. Outliers corresponding to higher losses are generally caused by noisy data—they are either not uppercase English letters or have fundamentally uncommon structure.
5.2.2 Retrieval and Exploration
Our sparse representation can be used to explore the space of glyphs, useful for artists and designers. By treating the control points as a metric space, we can perform nearestneighbor lookups to retrieve fonts using Euclidean distance.
In Figure 6, for each query, we compute its curve representation and retrieve seven nearest neighbors in curve space. Because our representation uses geometric structure, we find fonts that are similar structurally, despite decorative and stylistic differences.
We can also consider a path in curve space starting at the curves for one glyph and ending at those for another. By sampling nearest neighbors along this trajectory, we interpolate between glyphs. As in Figure 7, this produces meaningful collections of fonts for the same letter and reasonable results when the start and end glyphs are different letters. Additional results are available in the supplementary material.
Nearestneighbor lookups in curve space also can help find a font matching desired geometric characteristics. A possible workflow is in Figure 9—through incremental refinements of the curves the user can quickly find a font.
5.2.3 Style and Structure Mixing
Our sparse curve representation describes geometric structure, ignoring stylistic elements (\eg, texture, decorative details). We leverage this to warp a glyph with desired style to have a target structure of another glyph (Figure 10).
We first generate the sparse curve representation for source and target glyphs. Since our representation uses the same set of curves, we can estimate dense correspondences and use them to warp original vectors of source glyph to conform to the shape of the target. For each point on the source, we find the closest point on the its sparse curve representation. We then compute the translation from to the corresponding point on the target glyph’s sparse representation and apply this translation to .
5.2.4 Repair
Our system introduces a strong prior on glyph shape, allowing us to robustly handle noisy input. In [1], a generative adversarial network (GAN) generates new glyphs based on samples. The outputs, however, are raster images, often with noise and missing parts. Figure 11 shows how our method can simultaneously vectorize and repair GANgenerated glyphs. Compared to a vectorization tool like Adobe Illustrator Live Trace, we infer missing data to reasonably fit the template. This postprocessing step makes the glyphs from generative models usable starting points for font design.
5.2.5 Comparison and Ablation Study
We show a series of experiments that demonstrate the advantages of our loss over standard Chamfer distance as well as the contributions of each term in our loss. We demonstrate that while having 26 unique templates helps achieve better results, it is not crucial—we evaluate a network trained with three “simple templates” (Figure 4(b)), which capture the three topology classes of our data.
Loss type  Sec/it  Average error 

Full model (ours)  1.119  0.748 
No global term (ours)  1.111  0.817 
No surface term (ours)  1.109  0.761 
No alignment term (ours)  1.118  0.786 
Simple templates (ours)  1.127  0.789 
Chamfer  1.950  0.910 
Table 1 shows seconds per iteration for our full distance field loss, our loss without each of its three terms and with simple templates, and Chamfer loss. Training using our loss is nearly twice as fast as Chamfer per iteration. Moreover, each of our loss terms adds seconds. In these experiments, for training with the distance field function we use the same parameters as above, and for Chamfer loss, we use the same training procedure (hyperparameters, templates, batch size), sampling 5,000 points from the source and target geometry. We choose the highest possible value for which the sampled point pairwise distance matrix fits in GPU memory.
We also evaluate on 20 sansserif fonts, computing Chamfer distance between our predicted curves and ground truth geometry, sampling uniformly (average error in Table 1). Uniform sampling is a computationallyexpensive and nondifferentiable procedure only for evaluation a posteriori—not suitable for training. While it does not correct all of the Chamfer distance’s shortcomings, we use it as a baseline to evaluate quality. We limit to sansserif fonts since we do not expect to faithfully recover local geometry. We see our full loss outperforms Chamfer loss, and all three loss terms are necessary. Here, all models are trained for 35,000 iterations. Figure 12 shows qualitative results on test set glyphs; see supplementary material for additional results.
The global loss term strongly favors rough similarity between predicted and targeted geometry, helping training converge more quickly (see supplementary material Figure 18).
6 3D: Volumetric Primitive Prediction
We reconstruct 3D surfaces out of various primitives, which allow our model to be expressive, sparse, and abstract.
6.1 Approach
Our first primitive is a cuboid, parameterized by , where and a quaternion, \ie, an origincentered (hollow) rectangular prism with dimensions to which we apply rotation and then translation . {prop} Let be a cuboid with parameters and a point. Let using the Hamilton product. Then, the signed distance between and is
(14) 
where and , , denote the , , and coordinates, respectively, of vector . Our second primitive is a sphere, parameterized by its center and radius . We compute signed distance to the sphere via the expression .
Since these distances are signed, we can compute the distance to the union of primitives as in the 2D case, by taking a minimum over the individual primitive distances. Similarly, we can perform other boolean operations, \eg, difference. The result is, again, a signed distance function, and so we take its absolute value before computing losses.
6.2 Experiments
We train on the airplane and chair categories of ShapeNet [8] using distance fields from [12]. The input is a distance field. Hence, our method is selfsupervised: The distance field is the only information needed to compute the loss. Additional results are available in supplementary material.
Surface abstraction.
Figure 13 shows results of training shapespecific networks to build abstractions over chairs and airplanes from distance fields. Each network outputs 16 cuboids. We discard small cuboids with high overlap as in [40]. The resulting abstractions capture highlevel structures of the input.
Segmentation.
Because we place cuboids in a consistent way, we can use them for segmentation. Following [40], we demonstrate on the COSEG chair dataset. We first label each cuboid predicted by our network (trained on ShapeNet chairs) with one of the three segmentation classes in COSEG (seat, back, legs). Then, we generate a cuboid decomposition of each chair mesh in the dataset and segment by labelling each face according to its nearest cuboid (Figure 12(c)). We achieve a mean accuracy of 94.6%, exceeding the 89.0% accuracy reported by [40].
Single view reconstruction.
In Figure 14, we show results of a network that takes as input a ShapeNet model rendering and outputs parameters for the union of four cuboids minus the union of four spheres. We see that for inputs that align well with this template, the network produces good results. It is not straightforward to achieve unsupervised CSGstyle predictions using samplingbased Chamfer distance loss.
7 Conclusion
Representation is a key theme in deep learning—and machine learning more broadly—applied to geometry. Assorted means of communicating a shape to and from a deep network present varying tradeoffs between efficiency, quality, and applicability. While considerable effort has been put into choosing representations for certain tasks, the tasks we consider have fixed representations for the input and output: They take in a shape as a function on a grid and output a sparse set of parameters. Using distance fields and derived functions as intermediate representations is natural and effective, not only performing well empirically but also providing a simple way to describe geometric loss functions through different discrepancies .
Our learning procedure is applicable to many additional tasks. A natural next step is to incorporate our network into more complex pipelines for tasks like rasterization of complex drawings [4], for which the output of a learning procedure needs to be combined with classical techniques to ensure smooth, topologically valid output. A challenging direction might be to incorporate user guidance into training or evaluation, developing the algorithm as a partner in shape prediction or reconstruction rather than generating a deterministic output.
Our experiments suggest several extensions for future work. The key drawback of our approach is the requirement of closedform distances for the primitives. While there are many primitives that could be incorporated this way, a fruitful direction might be to alleviate this requirement, \egby including flexible implicit primitives like metaballs [5]. We could also incorporate more boolean operations into our pipeline, which easily supports them using algebraic operations on signed distances, in analogy to the CAD pipeline to generate complex topologies and geometries with few primitives. The combinatorial problem of determining the best sequence of boolean operations for a given input would be particularly challenging even for clean data [13]. Finally, it may be possible to incorporate our network into generative algorithms to create new unseen shapes.
References
 [1] S. Azadi, M. Fisher, V. Kim, Z. Wang, E. Shechtman, and T. Darrell. Multicontent gan for fewshot font style transfer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, volume 11, page 13, 2018.
 [2] J. L. Ba, J. R. Kiros, and G. E. Hinton. Layer normalization. arXiv preprint arXiv:1607.06450, 2016.
 [3] E. Balashova, A. Bermano, V. G. Kim, S. DiVerdi, A. Hertzmann, and T. Funkhouser. Learning a strokebased representation for fonts. CGF, 2018.
 [4] M. Bessmeltsev and J. Solomon. Vectorization of line drawings via polyvector fields. ACM Transactions on Graphics (TOG), 2019.
 [5] J. F. Blinn. A generalization of algebraic surface drawing. ACM Transactions on Graphics (TOG), 1(3):235–256, 1982.
 [6] G. Borgefors. Distance transformations in arbitrary dimensions. Computer vision, graphics, and image processing, 27(3):321–345, 1984.
 [7] N. D. Campbell and J. Kautz. Learning a manifold of fonts. ACM Transactions on Graphics (TOG), 33(4):91, 2014.
 [8] A. X. Chang, T. Funkhouser, L. Guibas, P. Hanrahan, Q. Huang, Z. Li, S. Savarese, M. Savva, S. Song, H. Su, J. Xiao, L. Yi, and F. Yu. ShapeNet: An InformationRich 3D Model Repository. Technical Report arXiv:1512.03012 [cs.GR], Stanford University — Princeton University — Toyota Technological Institute at Chicago, 2015.
 [9] L.C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelligence, 40(4):834–848, 2018.
 [10] C. B. Choy, D. Xu, J. Gwak, K. Chen, and S. Savarese. 3dr2n2: A unified approach for single and multiview 3d object reconstruction. In European conference on computer vision, pages 628–644. Springer, 2016.
 [11] D.A. Clevert, T. Unterthiner, and S. Hochreiter. Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289, 2015.
 [12] A. Dai, C. R. Qi, and M. Nießner. Shape completion using 3dencoderpredictor cnns and shape synthesis. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), volume 3, 2017.
 [13] T. Du, J. P. Inala, Y. Pu, A. Spielberg, A. Schulz, D. Rus, A. SolarLezama, and W. Matusik. Inversecsg: Automatic conversion of 3d models to csg trees. In SIGGRAPH Asia 2018 Technical Papers, page 213. ACM, 2018.
 [14] H. Fan, H. Su, and L. J. Guibas. A point set generation network for 3d object reconstruction from a single image. In CVPR, volume 2, page 6, 2017.
 [15] J. FuentesPacheco, J. RuizAscencio, and J. M. RendónMancha. Visual simultaneous localization and mapping: a survey. Artificial Intelligence Review, 43(1):55–81, 2015.
 [16] V. GanapathiSubramanian, O. Diamanti, S. Pirk, C. Tang, M. Niessner, and L. Guibas. Parsing geometry using structureaware shape templates. In 2018 International Conference on 3D Vision (3DV), pages 672–681. IEEE, 2018.
 [17] T. Groueix, M. Fisher, V. G. Kim, B. C. Russell, and M. Aubry. Atlasnet: A papiermâché approach to learning 3d surface generation. arXiv preprint arXiv:1802.05384, 2018.
 [18] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
 [19] K. He, X. Zhang, S. Ren, and J. Sun. Identity mappings in deep residual networks. In European conference on computer vision, pages 630–645. Springer, 2016.
 [20] P. Isola, J.Y. Zhu, T. Zhou, and A. A. Efros. Imagetoimage translation with conditional adversarial networks. arXiv preprint, 2017.
 [21] V. G. Kim, W. Li, N. J. Mitra, S. Chaudhuri, S. DiVerdi, and T. Funkhouser. Learning partbased templates from large collections of 3d shapes. ACM Transactions on Graphics (TOG), 32(4):70, 2013.
 [22] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
 [23] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105, 2012.
 [24] L. Li, M. Sung, A. Dubrovina, L. Yi, and L. Guibas. Supervised fitting of geometric primittives to 3d point clouds. arXiv preprint arXiv:1811.08988, 2018.
 [25] Y. Liao, S. Donné, and A. Geiger. Deep marching cubes: Learning explicit surface representations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2916–2925, 2018.
 [26] X. Liu and K. Fujimura. Hand gesture recognition using depth data. In Proc. 6th IEEE Int. Conf. Automatic Face Gesture Recog., page 529. IEEE, 2004.
 [27] J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3431–3440, 2015.
 [28] C. Niu, J. Li, and K. Xu. Im2struct: Recovering 3d shape structure from a single rgb image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4521–4529, 2018.
 [29] P. O’Donovan, J. Lībeks, A. Agarwala, and A. Hertzmann. Exploratory font selection using crowdsourced attributes. ACM Transactions on Graphics (TOG), 33(4):92, 2014.
 [30] M. Ovsjanikov, W. Li, L. Guibas, and N. J. Mitra. Exploration of continuous variability in collections of 3d shapes. In ACM Transactions on Graphics (TOG), volume 30, page 33. ACM, 2011.
 [31] J. J. Park, P. Florence, J. Straub, R. Newcombe, and S. Lovegrove. Deepsdf: Learning continuous signed distance functions for shape representation. arXiv preprint arXiv:1901.05103, 2019.
 [32] H. Q. Phan, H. Fu, and A. B. Chan. Flexyfont: Learning transferring rules for flexible typeface synthesis. In Computer Graphics Forum, volume 34, pages 245–256. Wiley Online Library, 2015.
 [33] Z. Qin, M. D. McCool, and C. S. Kaplan. Realtime texturemapped vector glyphs. In Proceedings of the 2006 symposium on Interactive 3D graphics and games, pages 125–132. ACM, 2006.
 [34] A. Schulz, A. Shamir, I. Baran, D. I. Levin, P. SitthiAmorn, and W. Matusik. Retrieval on parametric shape collections. ACM Transactions on Graphics (TOG), 36(1):11, 2017.
 [35] S. M. Seitz, B. Curless, J. Diebel, D. Scharstein, and R. Szeliski. A comparison and evaluation of multiview stereo reconstruction algorithms. In CVPR, 2006.
 [36] C.H. Shen, H. Fu, K. Chen, and S.M. Hu. Structure recovery by part assembly. ACM Transactions on Graphics (TOG), 31(6):180, 2012.
 [37] D. Stutz and A. Geiger. Learning 3d shape completion under weak supervision. International Journal of Computer Vision, pages 1–20, 2018.
 [38] H. Su, S. Maji, E. Kalogerakis, and E. LearnedMiller. Multiview convolutional neural networks for 3d shape recognition. In Proceedings of the IEEE international conference on computer vision, pages 945–953, 2015.
 [39] R. Suveeranont and T. Igarashi. Examplebased automatic font generation. In International Symposium on Smart Graphics, pages 127–138. Springer, 2010.
 [40] S. Tulsiani, H. Su, L. J. Guibas, A. A. Efros, and J. Malik. Learning shape abstractions by assembling volumetric primitives. In Proc. CVPR, volume 2, 2017.
 [41] N. Umetani, T. Igarashi, and N. J. Mitra. Guided exploration of physically valid shapes for furniture design. ACM Trans. Graph., 31(4):86–1, 2012.
 [42] P. Upchurch, N. Snavely, and K. Bala. From a to z: supervised transfer of style and content using deep neural network generators. arXiv preprint arXiv:1603.02003, 2016.
 [43] F. Yu and V. Koltun. Multiscale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122, 2015.
 [44] J.Y. Zhu, R. Zhang, D. Pathak, T. Darrell, A. A. Efros, O. Wang, and E. Shechtman. Toward multimodal imagetoimage translation. In Advances in Neural Information Processing Systems, pages 465–476, 2017.