Learning Part Generation and Assembly for Structureaware Shape Synthesis
Abstract
Learning deep generative models for 3D shape synthesis is largely limited by the difficulty of generating plausible shapes with correct topology and reasonable geometry. Indeed, learning the distribution of plausible 3D shapes seems a daunting task for most existing holistic shape representation, given the significant topological variations of 3D objects even within the same shape category. Enlightened by the common view that 3D shape structure is characterized as part composition and placement, we propose to model 3D shape variations with a partaware deep generative network which we call PAGENet. The network is composed of an array of perpart VAEGANs, generating semantic parts composing a complete shape, followed by a part assembly module that estimates a transformation for each part to correlate and assemble them into a plausible structure. Through splitting the generation of part composition and part relations into separate networks, the difficulty of modeling structural variations of 3D shapes is greatly reduced. We demonstrate through extensive experiments that PARANet generates 3D shapes with plausible, diverse and detailed structure, and show two prototype applications: semantic shape segmentation and shape set evolution.
1 Introduction
Learning deep generative models has been one of the most exciting and active research area in deep learning. Following that trend, learning deep generative models for 3D shape synthesis has been increasingly studied lately. Despite the notable success made by several recent works [42], one major difficulty which is seldom being touched is how to ensure the structural correctness of the generated shapes.
The existing generative models are mostly structureoblivious. These models tend to generate 3D shapes in a holistic manner, without comprehending its compositional parts explicitly. Consequently, when generating 3D shapes with detailed structures, the details of complicated part structures are often blurred or even messed up (see Figure 1, the right column). To alleviate this issue, one common practice is to increase the shape resolution to better capture finegrained details, with the cost of increased learning time and training examples. The main reason behind is that the 3D shape representation employed by those models, e.g., volumetric grid or point cloud, are oblivious to shape structures. With such representations, part information is not encoded and thus cannot be decoded during the generation process.
In 3D shape analysis, a common view about the “structure” of a shape is the combination of the part composition and the mutual positioning between parts [27]. Following this insight, we approach the modeling of structural variations of 3D shapes through learning a partaware generative network, named as PAGENet. The model is thus aware of what parts it is generating, through a semanticpartwise generation process. On the other hand, the model should be able to preserve the mutual spatial placement between the generated parts, according to some learned part assembly priors. The mutual placement determines how two adjacency semantic parts assemble together in order to form a structurally valid 3D shape.
PAGENet is composed of an array of part generators, each of which is a combination of variational autoencoder (VAE) and generative adversarial network (GAN) trained for generating a specified semantic part of the target shape category, followed by a part assembly module that estimates a transformation for each part used to assemble them into a valid shape structure (Figure 1). In our work, PAGENet is realized in the volumetric setting although it can be easily extended to support other basic representations such as point clouds. Being partaware, PAGENet generates quality 3D shapes with detailed, plausible and diverse structure.
Our model splits the generation of parts and relations into two separate networks, thus greatly reducing the difficulty in modeling structural variations of 3D shapes. In our model, 3D structure variation is modeled by the concatenation of the latent vectors of all part generators, forming a structured latent space [10] of 3D shapes with different dimensions controlling the generation of different parts. It facilitates partlevel user control or editing over the generated shapes. Moreover, the concatenated latent codes can be used as “shape DNA”, over which “genetic operations” (mutation and crossover) can be performed, to achieve shape set evolution [45]. Through mapping a 3D shape into the structured latent space, the part generators altogether can lead to a semantic segmentation of the shape.
Our main contributions include:

A divideandconquer framework for learning a deep generative model of structureaware shape generation.

A part assembly module to relate the generated semantic parts with their assembly transformations.

Two prototype applications including semantic segmentation and set evolution of 3D shapes.
2 Related work
Modeling 3D shape variations.
The study of modeling 3D shape variability dates back to statistical learning of parametric models of faces [6] and bodies [2]. The task of modeling structural variation of 3D shapes of manmade objects is much harder. Most existing works learn one or multiple parametric template of part arrangement from collections of training shapes [30, 23, 14]. These methods often require part correspondence of the training shapes. Probabilistic graphical models can be used to model shape variability as the causal relations between shape parts [22]. Presegmented and part labeled shapes are required for learning such models. Entering the era of deep learning, deep generative models have been utilized to learn shape space of 3D objects in a unsupervised manner.
Deep generative models of 3D shapes.
Deep generative models for 3D shape generation have been developed based on various 3D representations, such as volumetric grids [42, 17, 41, 32], point clouds [13, 1], surface meshes [18, 40], implicit functions [11, 31], and multiview images [35]. Common to these works is that shape variability is modeled in a holistic, structureoblivious fashion, which is mainly due to the limited options of deeplearningfriendly 3D shape representations.
Structureaware 3D shape synthesis.
Since the seminal work of “Modeling by Example” [15], numerous research effort has been devoted on partbased, datadriven 3D shape synthesis (e.g. [9, 22, 45, 4]). A comprehensive survey is available [44]. Partbased methods are inherently structureaware: Shapes are generated in parts and part relations are preserved to form a valid structure. In the traditional approaches, however, parts are retrieved from a shape database (e.g. part suggestion [9, 38, 37]) instead of being generated from scratch. Meanwhile, part assembly relies on part corresponding [45, 4] or part labeling [9, 22].
Apart from the traditional approaches, research on deep generative models for structureaware shape synthesis starts to gain increasing attention recently. Huang et al. [20] propose a deep generative model based on partbased templates learned a priori, which is not endtoend trainable. Li et al. [25] propose the first deep generative model of 3D shape structures. They employ recursive neural network to achieve hierarchical encoding and decoding of parts and relations. However, this model does not explicitly ensure a quality part assembly as in our method; see Section 5 for a comparison. Zou et al. [52] propose to learn sequential part generation with recurrent neural networks, which, however, produces only cuboids but no detailed geometry.
Nash and Williams [29] propose ShapeVAE to generate partsegmented 3D objects. Their model is trained using shapes with dense point correspondence. On the contrary, our model requires only part level correspondence. Wu et al. [43] couples the synthesis of intrapart geometry and interpart structure. Similar idea is proposed in [5] where landmarkbased structure priors are used for structureaware shape generation. Wang et al. [39] propose to generate 3D shapes with part labeling using a carefully designed GAN, and then pass the shape to a pretrained part refiner to obtain higher quality shape volume. Our method takes a reverse process where we first generate parts and then their assembling transformations.
The most closely related are the two concurrent works of partbased shape synthesis in [34] and [12]. Schor et al. [34] train partwise generators and a part composition network for the generation of 3D point clouds. Dubrovina et al. [12] propose a decomposercomposer network to learn a factorized shape embedding space for partbased 3D shape modeling. Different from our work, however, their model is not a generative one. In their work, novel shapes are synthesized through randomly sampling and assembling the preexiting parts embedded in the factorized latent space.
Structureaware 3D shape deformation.
Deformation is another common way of generating shape variations. In handling manmade shapes, structureaware deformation has been a central research goal [16, 50, 7]. Existing deep models for 3D shape deformation have so far been mainly focusing on freeform deformation [48, 24, 26, 21], which is not designed for global structure preservation. The part composition network in [51] performs structurepreserving deformation in substructure level. Our part assembly module achieves multipart joint deformation, through learning interpart assembling transformations.
3 Method
3.1 Network architecture
Our network architecture (Figure 2) is straightforward. It is composed of two modules: a partwise generative network and a part assembly module. The partwise generative network contains part generators, each for one of the predefined semantic part labels (e.g., back, seat, leg and armrest for a chair). Each part generator is trained to generate a volume of a specific part from a random vector. Taking the generated volumes for all the parts as input, the part assembly module predicts a transformation (scaling + translation) for each part, to assemble the parts into a complete shape with proper part scaling and interpart connection.
3.2 Partwise generative network
The partwise generative network is simply a collection of part generators. For each semantic part, we train a generative network of 3D volumes, which is a combination of variational autoencoder (VAE) and generative adversarial network (GAN), or VAEGAN. The VAE part comprises of an encoder and a decoder of 3D volumes with a resolution of . The dimension of the latent vector is . Similar to [42], the encoder consists of five volumetric fully convolutional layers with a kernel size of and a stride of . Batch normalization and ReLU layers are inserted between convolutional layers. The decoder / generator simply reverses the encoder, except that a nonlinearity is used in the last layer. Following the decoder, the encoder architecture is reused in learning a discriminator that tells whether a given part volume is real (voxelization of a real shape part) or fake (generated by the generator).
Therefore, the loss function for a part generator consists of three terms: a part volume reconstruction loss , a KullbackLeibler divergence loss and a adversarial loss . In addition, we introduce a reflective symmetry loss to penalize generating asymmetric parts. This loss would help regularize the part generation, since most parts are reflective symmetric. For those asymmetric parts, the weight of is set to . In summary, the loss is defined as:
(1) 
where measures the meansquareerror (MSE) loss between the input volume and output volume ; is the MSE loss between a volume and its reflection about the reflection plane of the input shape; is a Kronecker delta function indicating whether part shares reflective symmetry with the full shape. In training, we can detect reflective symmetry easily using the method in [28] for a training shape and its semantic parts, to evaluate for each part.
For the adversarial training, we follow WGANGP [19] which improves Wasserstein GAN [3] with a gradient penalty to train our generative model,
(2) 
where is the discriminator, and are the distributions of generated part volumes and real part volumes, respectively. The last term is the gradient penalty and is sampled uniformly along straight lines between pairs of points from the data distribution and the generator distribution . The discriminator attempts to minimize while the generator maximizes the first term in Equation (2).
3.3 Part assembly module
Since the part volumes are generated independently, their scales may not match with each other and their positions may disconnect adjacent parts. Taking part volumes generated from the partwise generative network, part assembly module regresses a transformation, including a scaling and a translation, for each part. It relates different semantic parts, with proper resizing and repositioning, to assemble them into a valid and complete shape volume. Essentially, it learns the spatial relations between semantic parts (or part arrangements [49]) in terms of relative size and position, as well as mutual connection between different parts.
Part assembly module takes part volumes as input, which amount to a input tensor. The input tensor is passed through five volumetric fully convolutional layers of kernel sizes with a stride of . Similar to part encoders, batch normalization and ReLU layers are used between convolutional layers. In the last layer, a sigmoid layer is added to regress the scaling and translation parameters. To ease the training, we normalize all scaling and translation parameters into , based on the allowed range of scaling () and translation (, with the unit being voxel size). The actual values of scaling and translation parameters are recovered when being applied.
Anchored transformation.
Given part volumes, the transformations assembling them together is not unique. Taking the chair model in Figure 3 as an example, the chair seat can be stretched to match the back, while the back can also be shrunk to conform to the seat; both result in a valid shape structure. This also adds some diversity to the generation. To make the transformation estimation determined and the assembly network easier to train, we introduce an extra input to the part assembly module to indicate a anchor part. When estimating part transformations, the anchor part is kept fixed (with an identity transformation) while all the other parts are transformed to match the anchor. To do this, one option is to input an indicator vector (a onehot vector with the corresponding part being ). However, the dimension of this indicator vector is too small, making its information easily overwhelmed by the large tensor of part volumes. Therefore, we opt to infuse anchor information by setting the occupied voxels in the anchor part volume to , to strongly contrast against the ’s in the volumes of the free parts. During test, the anchor part can be randomly selected or userspecified; see Figure 4.
3.4 Training details
We train and test our partwise VAEGANs and part assembly module on a subset of ShapeNet [8]. This subset, proposed in [46], provides consistent alignment and semantic labeling for all shapes. We select four representative categories exhibiting rich part structure variation, including chairs (3746), airplanes (2690), lamps (1546), motorbikes (202). In the dataset, each object category has a fixed number of semantic parts: a chair contains a back, a seat, a leg and an armrest; an airplane consists of a body, a wing, a tail and an engine; a lamp has a base, a shade, and a tube; a motorbike is composed of a light, a gas tank, a seat, a body, a wheel and a handle. Note that a shape may not contain all semantic parts belonging to the corresponding category. The dataset is divided into two parts, according to the official training/test split, to train and test our partwise generative network and part assembly module. To enhance the training set, we employ the structureaware deformation technique in [50] to deform each shape, generating about variations of the shape. Finally, each shape and its semantic parts are voxelized to form our training set.
The partwise VAEGANs are trained with part volumes. We augment the dataset of part volumes via randomly scaling and translating the parts, with the ranges in for scaling and (in voxels) for translation. To train the part assembly module, we generate a large set of training pairs of messedup part arrangement and groundtruth assembly, with randomly selected anchor part. The messedup arrangements are generated by randomly scaling and translating the semantic parts of a shape. The inverse of the messingup transformations are used as groundtruth assembling transformations. Besides that, we also introduce some random noise to the training part volumes, to accommodate the imperfect volume generation during testing.
As Wasserstein GANs usually have large gradients, which might result in unstable training. We opt to first pretrain the VAEs and then finetune them via joint training with the discriminators. For both the partwise VAEGANs and the part assembly module, we set the initial learning rate to , and use ADAM () for network optimization. Batch size is set to . For the parameters in the loss computation in Equation (1), we use and for all experiments. is set to as in [19].
Note that the part generation and assembly networks are not trained jointly, in an endtoend fashion, since there is no groundtruth assembly for the parts generated by VAE (random generation). It is, however, possible to make the whole pipeline endtoend trainable if a discriminator network could be devised to judge whether the final assembled shape is reasonable or not. We leave this for future work.
4 Results and Evaluations
PARANet  3DGAN  G2L  GT  

w/ sym. loss  w/o sym. loss  
Chair  back  
seat  
leg (sym.)  
leg (asym.)  
armrest  
full  
Airplane  body  
wing  
tail  
engine  
full 
Chair  Plane  Motorbike  Lamp  

Templatebased  0.60  0.65  0.56  0.52  
Anchor part  seat  back  leg  armrest  body  wing  tail  engine  light  gas tank  seat  handle  wheel  body  base  shade  tube 
Onehot vector  0.77  0.78  0.79  0.72  0.76  0.75  0.73  0.74  0.72  0.76  0.74  0.73  0.71  0.76  0.79  0.77  0.70 
Ours  0.83  0.81  0.82  0.79  0.80  0.82  0.76  0.82  0.80  0.79  0.82  0.79  0.78  0.82  0.81  0.83  0.77 
Ours (training data)  0.89  0.91  0.90  0.88  0.89  0.87  0.90  0.88  0.87  0.92  0.86  0.86  0.85  0.87  0.91  0.89  0.81 
Partwise generation.
Symmetry preservation is especially useful for generating of manmade shapes. Through imposing reflective symmetry regularization for those parts which are reflectively symmetric, our model is able to produce structurally more plausible shapes. To evaluate symmetry preservation in shape generation, we define a symmetry measure for generated shapes. Given a generated shape volume, the reflective plane is the vertical bisector plane of the volume, since all training data were globally aligned and centered before voxelization. The symmetry measure can be obtained simply by reflecting the left half of the shape volume and computing the IoU against the right half.
Table 1 shows the average symmetry measures on randomly generated shapes by our method. The results are reported both for full shape and individual semantic parts. We also compare to a baseline model trained without symmetry loss, as well as the 3DGAN model proposed in [42] and G2L [39]. In G2L [39], they trained an extra part refinement network via minimizing the average reconstruction loss against three nearest neighbors retrieved from the training set. While achieving a higher symmetry score, such refinement also limited the diversity of the generated shapes, which is indicated by the lower inception score than ours in Table 3.
An interesting feature of our part generators is that it learns when to impose symmetry constraint on the generated parts, through judging from the input random vectors. This is due to the discriminative treatment of reflectively symmetric and asymmetric parts during the generator training (Equation (1)). Taking the leg part generator for example, if a random vector of a fourleg chair is input, the leg generator will preserve the reflective symmetry in the leg part. If, on the other hand, the input random vector implies a swivel chair would be generated, the symmetry preservation will be automatically disabled since the leg part of a swivel chair is mostly not reflectively symmetric in reality. This is reflected in the average symmetry measures of symmetric and asymmetric legs in Table 1.
Part assembly.
To evaluate the ability of our part assembly module, we test it on the testing set. For each test shape, we perturb each of its semantic parts with random scaling and translation, and use our network to regress the transformation. The assembly quality is measured by the IoU between the assembled shape volume and the groundtruth. In testing, we choose each semantic part as anchor and report the average IoU as the assembly quality. In Table 2, we compare the assembly performance over three methods. The first is our method. The second is our method in which the anchor part is indicated by a onehot vector. The third one is a templatebased part assembly where we retrieve a template shape from the training set based on partwise CNN features. We then transform the shape parts according to the corresponding parts in the template, since part correspondence is available for all shapes in the training set and the generated shapes. For contrasting, we also show the performance on the training shapes (the last row).
The results show that our part assembly generalizes well to unseen shapes. The numbers repored in Table 2 are under messy part arrangement with random scale from and random translation from . Figure 5 plots the assembly quality measure over varying amount of translation and scaling. Our method obtains reasonably good assembly results within the range of for translation and for scaling. Note, however, the goal of our part assembly module is not to reconstruct an input shape. In fact, there is not a unique solution to structurally plausible part assembly. Therefore, this experiment only approximately evaluates the assembly ability of our model.
Random shape generation.
Figure 6 shows a few examples of random generation for all four shape categories. For each shape, both the generate part volumes (overlaid) and the final assembling result are shown. A nice feature of our method is that the generated shapes all possess semantic segmentation by construction, which can be used in training data enhancement for shape segmentation. More generation results can be found in the supplemental material.
In Table 3, we compare the diversity of random generation by our method and three alternatives including 3DGAN [42], GRASS [25] and G2L [39]. Similar to [39], we use the inception score [33] to measure the diversity of shape sets. In particular, we first cluster the training shapes and then train a classifier targeting the clusters. The inception score for a given set of shapes is then measured based on the confidence and variance of the classification over the set. From the results, our method achieves consistently more diverse generation than alternatives, thanks to the partwise shape variation modeling.
Highres. and point cloud generation.
The split of part synthesis and part assembly in our approach well supports highresolution shape generation. We first synthesize each part in very high resolution, within a local volume around the part. The synthesized parts are then placed into a lowresolution global volume, in which the part assembly module estimates an assembling transformation for each part. The transformed parts are then unified in a highresolution volume, resulting in a highres 3D model. Figure 7(top) shows four examples of highres shape generation. Through implementing the part generators with point cloud representation [1], PARANet can also support 3D point cloud generation; see Figure 7(bottom).
Comparison with GRASS [25].
Figure 8 shows a visual comparison to GRASS. Although GRASS can recover part relations (adjacency and symmetry) in the generated shapes, it does not explicitly learn how adjacent parts are connected. Therefore, parts generated by GRASS can sometimes mistakenly detach. In contrast, PARANet leads to better part connection thanks to the learned part assembly prior. To quantitatively evaluate part assembly quality, we propose two measures, one objective and one subjective. The objective measure simply examines the voxel connectivity of a generated shape volume. In the subjective evaluation, we recruited five human participants to visually inspect and vote for the correctness of part connections of a generated shape. For both measures, we compute the average success rate of part assembly for each shape category. Table 4 compares the success rate of the two methods over randomly generated shapes for each category.
Chair  Airplane  Motorbike  Lamp  

Obj.  Subj.  Obj.  Subj.  Obj.  Subj.  Obj.  Subj.  
GRASS  
PARANet 
Shape interpolation.
Through breaking down 3D shape generation into partwise generation and part assembly inference, our method is able to model significant variation of 3D shape structures. This can be demonstrated by shape interpolation between shapes with significantly different structures. Figure 9 shows two such examples. Our method can generate high quality inbetween shapes with detailed shape structures, although the source and target shapes have considerably different structures. More interpolation results can be found in the supplemental material.
Arithmetic in latent space.
Due to the partaware representation in our model, the latent space for full shapes is by construction structured. Therefore, our model naturally supports arithmetic modeling of 3D shapes, at the granularity of semantic part. Figure 10 shows two examples arithmetic modeling. Again, the final shapes possess detailed shape structures due to partaware generation. Meanwhile, the overall structures look plausible although the parts are originally from different shapes.
5 Applications
Shape set evolution.
In [45], a bioinspired approach to batch generation of 3D shapes is introduced: An initial population of 3D shapes is evolved to produce generations of novel shapes. The core technique supporting 3D shape evolution is the realization of two basic “genetic operators”, mutation and crossover, which are key to maintain shape diversity from one generation to the next. In [45], shape mutation and crossover are realized as direct part alternation and recombination. There is not a notion of “shape DNA”, where genetic operators are performed on some “shape chromosome”, and new shapes are recovered from the resultant chromosomes analogous to genetic expression.
With the partaware shape representation in our model, genetic operators on shape chromosomes (latent codes) can be easily defined. Specifically, mutation can be realized by randomly altering the values of some random dimensions of the latent code. Crossover can be achieved by recombining the code pieces corresponding to semantic parts. See Figure 11 for two examples of crossover operation. After genetic operations, new shapes can be recovered by our learned decoder and part assembler. To some extent, our partaware latent codes can be viewed as a partlevel “shape DNA”, which can be used in shape set evolution. Figure 12 shows two generations of shape set evolution starting from an initial population of chair models. See more evolution results in the supplemental material.
Shape segmentation.
The partwise generation of our model can also be used to segment a 3D shape. The network is shown in Figure 13 (top). Given a 3D shape in volumetric representation, we first project it into the latent space of PARANet based on a trained shape projection network. It encodes the shape volume with five volumetric convolutional layers and project the input shape volume to a latent code. Then, our pretrained PARANet is used to reconstruct a 3D shape (with semantic segmentation). The projection network is trained by minimizing the reconstruction loss against the input shape volume, while keeping the PARANet part fixed. During testing, passing a 3D shape volume into the network results in a reconstructed 3D volume with semantic segmentation. Since the recovered 3D volume is geometrically close to the input shape volume (due to the selfreconstruction training), we can accurately transfer its voxel labels onto the input volume, thus obtaining a semantic segmentation for the input.
Figure 13 shows a few examples of such segmentation on the testing set. Essentially, we learn a deep model of segmentation transfer. The model integrates a prelearned partaware shape manifold with a shapetomanifold projector: To segment a shape, the projector retrieves the nearest neighbor from the manifold and generate semantic segmentation for the retrieved shape, whose segmentation can be easily transferred to the input due to shape resemblance. Table 5 shows some quantitative results of segmentation with comparison with a few stateoftheart methods. It shows that our method achieves comparable performance with the stateofthearts. Extended results of shape segmentation can be found in the supplemental material.
6 Conclusion
We have proposed a simple and effective generative model for quality 3D shape generation. The model knows what it generates (semantic parts) and how the generated parts correlate with each other (by assembling transformation). This makes the generation partaware and structurerevealing. Our model adopts a divideandconquer scheme and thus greatly reduces the difficulty in modeling full shape variations. There are two main limitations. First, our model relies on hardcoded split of semantic part generators which is not adaptable for a different label set. Learning a structureaware generative model with a builtin shape decomposition module is an interesting future direction. Second, our method currently works with major semantic parts; although it can be extended to synthesize and assemble more finegrained parts, too many parts would increase the difficulty of part assembly. This could be alleviated with the help of a hierarchical part organization as in [51].
References
 [1] P. Achlioptas, O. Diamanti, I. Mitliagkas, and L. Guibas. Learning representations and generative models for 3d point clouds. arXiv preprint arXiv:1707.02392, 2018.
 [2] B. Allen, B. Curless, and Z. Popović. The space of human body shapes: Reconstruction and parameterization from range scans. ACM Trans. Graph., 22(3), 2003.
 [3] M. Arjovsky, S. Chintala, and L. Bottou. Wasserstein gan. arXiv preprint arXiv:1701.07875, 2017.
 [4] M. Averkiou, V. G. Kim, Y. Zheng, and N. J. Mitra. Shapesynth: Parameterizing model collections for coupled shape exploration and synthesis. Computer Graphics Forum, 33(2):125–134, 2014.
 [5] E. Balashova, V. Singh, J. Wang, B. Teixeira, T. Chen, and T. Funkhouser. Structureaware shape synthesis. In 3D Vision (3DV), 2018.
 [6] V. Blanz and T. Vetter. A morphable model for the synthesis of 3D faces. In Proc. of SIGGRAPH, pages 187–194, 1999.
 [7] M. Bokeloh, M. Wand, V. Koltun, and H.P. Seidel. Patternaware shape deformation using sliding dockers. ACM Trans. on Graph. (SIGGRAPH Asia), 30(6):123, 2011.
 [8] A. X. Chang, T. Funkhouser, L. J. Guibas, P. Hanrahan, Q. Huang, Z. Li, S. Savarese, M. Savva, S. Song, H. Su, J. Xiao, L. Yi, and F. Yu. ShapeNet: An InformationRich 3D Model Repository. (arXiv:1512.03012 [cs.GR]), 2015.
 [9] S. Chaudhuri, E. Kalogerakis, L. Guibas, and V. Koltun. Probabilistic reasoning for assemblybased 3d modeling. ACM Trans. on Graph. (SIGGRAPH), 30(4):35:1–35:10, 2011.
 [10] X. Chen, Y. Duan, R. Houthooft, J. Schulman, I. Sutskever, and P. Abbeel. Infogan: Interpretable representation learning by information maximizing generative adversarial nets. In Proc. NIPS, pages 2172–2180, 2016.
 [11] Z. Chen and H. Zhang. Learning implicit fields for generative shape modeling. In CVPR, 2019.
 [12] A. Dubrovina, F. Xia, P. Achlioptas, M. Shalah, and L. Guibas. Composite shape modeling via latent space factorization. arXiv preprint arXiv:1803.10932, 2019.
 [13] H. Fan, H. Su, and L. Guibas. A point set generation network for 3d object reconstruction from a single image. arXiv preprint arXiv:1612.00603, 2016.
 [14] N. Fish, M. Averkiou, O. Van Kaick, O. SorkineHornung, D. CohenOr, and N. J. Mitra. Metarepresentation of shape families. ACM Transactions on Graphics (TOG), 33(4):34, 2014.
 [15] T. Funkhouser, M. Kazhdan, P. Shilane, P. Min, W. Kiefer, A. Tal, S. Rusinkiewicz, and D. Dobkin. Modeling by example. ACM Transactions on Graphics (Proc. SIGGRAPH), Aug. 2004.
 [16] R. Gal, O. Sorkine, N. J. Mitra, and D. CohenOr. iwires: an analyzeandedit approach to shape manipulation. ACM Trans. on Graph. (SIGGRAPH), 28(3):33, 2009.
 [17] R. Girdhar, D. F. Fouhey, M. Rodriguez, and A. Gupta. Learning a predictable and generative vector representation for objects. In European Conference on Computer Vision, pages 484–499. Springer, 2016.
 [18] T. Groueix, M. Fisher, V. G. Kim, B. C. Russell, and M. Aubry. A papiermâché approach to learning 3d surface generation. In CVPR, pages 216–224, 2018.
 [19] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville. Improved training of wasserstein gans. In NIPS, pages 5769–5779, 2017.
 [20] H. Huang, E. Kalogerakis, and B. Marlin. Analysis and synthesis of 3d shape families via deeplearned generative models of surfaces. Computer Graphics Forum, 34(5), 2015.
 [21] D. Jack, J. K. Pontes, S. Sridharan, C. Fookes, S. Shirazi, F. Maire, and A. Eriksson. Learning freeform deformations for 3d object reconstruction. arXiv preprint arXiv:1803.10932, 2018.
 [22] E. Kalogerakis, S. Chaudhuri, D. Koller, and V. Koltun. A Probabilistic Model of ComponentBased Shape Synthesis. ACM Transactions on Graphics, 31(4), 2012.
 [23] V. G. Kim, W. Li, N. J. Mitra, S. Chaudhuri, S. DiVerdi, and T. Funkhouser. Learning partbased templates from large collections of 3D shapes. ACM Transactions on Graphics (Proc. SIGGRAPH), 32(4), July 2013.
 [24] A. Kurenkov, J. Ji, A. Garg, V. Mehta, J. Gwak, C. Choy, and S. Savarese. Deformnet: Freeform deformation network for 3d shape reconstruction from a single image. 2018.
 [25] J. Li, K. Xu, S. Chaudhuri, E. Yumer, H. Zhang, and L. Guibas. Grass: Generative recursive autoencoders for shape structures. arXiv preprint arXiv:1705.02090, 2017.
 [26] K. Li, T. Pham, H. Zhan, and I. Reid. Efficient dense point cloud object reconstruction using deformation vector fields. In ECCV, 2018.
 [27] N. Mitra, M. Wand, H. R. Zhang, D. CohenOr, V. Kim, and Q.X. Huang. Structureaware shape processing. In SIGGRAPH Asia 2013 Courses, page 1. ACM, 2013.
 [28] N. J. Mitra, L. J. Guibas, and M. Pauly. Partial and approximate symmetry detection for 3d geometry. ACM Trans. on Graph., 25(3):560–568, 2006.
 [29] C. Nash and C. K. Williams. The shape variational autoencoder: A deep generative model of partsegmented 3d objects. Computer Graphics Forum (SGP 2017), 36(5):1–12, 2017.
 [30] M. Ovsjanikov, W. Li, L. Guibas, and N. J. Mitra. Exploration of continuous variability in collections of 3d shapes. ACM Trans. on Graph. (SIGGRAPH), 30(4):33:1–33:10, 2011.
 [31] J. J. Park, P. Florence, J. Straub, R. Newcombe, and S. Lovegrove. Deepsdf: Learning continuous signed distance functions for shape representation. In CVPR, 2019.
 [32] G. Riegler, A. O. Ulusoy, and A. Geiger. Octnet: Learning deep 3d representations at high resolutions. In Proc. CVPR, volume 3, 2017.
 [33] T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and X. Chen. Improved techniques for training gans. In Proc. NIPS, pages 2234–2242, 2016.
 [34] N. Schor, O. Katzier, H. Zhang, and D. CohenOr. Learning to generate the ”unseen” via part synthesis and composition. In IEEE International Conference on Computer Vision, 2019.
 [35] A. A. Soltani, H. Huang, J. Wu, T. D. Kulkarni, and J. B. Tenenbaum. Synthesizing 3d shapes via modeling multiview depth maps and silhouettes with deep generative networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1511–1519, 2017.
 [36] H. Su, C. R. Qi, K. Mo, and L. J. Guibas. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), page to appear, 2017.
 [37] M. Sung, A. Dubrovina, V. G. Kim, and L. Guibas. Learning fuzzy set representations of partial shapes on dual embedding spaces. Computer Graphics Forum, 37(5):71–81, 2018.
 [38] M. Sung, H. Su, V. G. Kim, S. Chaudhuri, and L. Guibas. ComplementMe: Weaklysupervised component suggestions for 3D modeling. ACM Trans. on Graph. (SIGGRAPH Asia), 2017.
 [39] H. Wang, N. Schor, R. Hu, H. Huang, D. CohenOr, and H. Huang. Globaltolocal generative model for 3d shapes. ACM Transactions on Graphics (Proc. SIGGRAPH ASIA), 37(6):214:1â214:10, 2018.
 [40] N. Wang, Y. Zhang, Z. Li, Y. Fu, W. Liu, and Y.G. Jiang. Pixel2mesh: Generating 3d mesh models from single rgb images. In ECCV, pages 52–67, 2018.
 [41] P.S. Wang, Y. Liu, Y.X. Guo, C.Y. Sun, and X. Tong. OCNN: Octreebased Convolutional Neural Networks for 3D Shape Analysis. ACM Transactions on Graphics (SIGGRAPH), 36(4), 2017.
 [42] J. Wu, C. Zhang, T. Xue, B. Freeman, and J. Tenenbaum. Learning a probabilistic latent space of object shapes via 3d generativeadversarial modeling. In Advances in Neural Information Processing Systems, pages 82–90, 2016.
 [43] Z. Wu, X. Wang, D. Lin, D. Lischinski, D. CohenOr, and H. Huang. Structureaware generative network for 3dshape modeling. arXiv preprint arXiv:1808.03981, 2018.
 [44] K. Xu, V. G. Kim, Q. Huang, N. Mitra, and E. Kalogerakis. Datadriven shape analysis and processing. In SIGGRAPH ASIA 2016 Courses, page 4. ACM, 2016.
 [45] K. Xu, H. Zhang, D. CohenOr, and B. Chen. Fit and diverse: set evolution for inspiring 3d shape galleries. ACM Transactions on Graphics (TOG), 31(4):57, 2012.
 [46] L. Yi, V. G. Kim, D. Ceylan, I.C. Shen, M. Yan, H. Su, C. Lu, Q. Huang, A. Sheffer, and L. Guibas. A scalable active framework for region annotation in 3d shape collections. SIGGRAPH Asia, 2016.
 [47] L. Yi, H. Su, X. Guo, and L. J. Guibas. Syncspeccnn: Synchronized spectral cnn for 3d shape segmentation. In CVPR, pages 6584–6592, 2017.
 [48] M. E. Yumer and N. J. Mitra. Learning semantic deformation flows with 3d convolutional networks. In European Conference on Computer Vision (ECCV 2016), pages –. Springer, 2016.
 [49] Y. Zheng, X. Chen, M.M. Cheng, K. Zhou, S.M. Hu, and N. J. Mitra. Interactive images: Cuboid proxies for smart image manipulation. ACM Transactions on Graphics, 31(4):99:1–99:11, 2012.
 [50] Y. Zheng, H. Fu, D. CohenOr, O. K.C. Au, and C.L. Tai. Componentwise controllers for structurepreserving shape manipulation. In Computer Graphics Forum, volume 30, pages 563–572. Wiley Online Library, 2011.
 [51] C. Zhu, K. Xu, S. Chaudhuri, R. Yi, and H. Zhang. Scores: Shape composition with recursive substructure priors. ACM Transactions on Graphics (SIGGRAPH Asia 2018), 37(6):to appear, 2018.
 [52] C. Zou, E. Yumer, J. Yang, D. Ceylan, and D. Hoiem. 3dprnn: Generating shape primitives with recurrent neural networks. In The IEEE International Conference on Computer Vision (ICCV), 2017.