Meshlet Priors for 3D Mesh Reconstruction
Abstract
Estimating a mesh from an unordered set of sparse, noisy 3D points is a challenging problem that requires carefully selected priors.
Existing handcrafted priors, such as smoothness regularizers, impose an undesirable tradeoff between attenuating noise and preserving local detail.
Recent deeplearning approaches produce impressive results by learning priors directly from the data.
However, the priors are learned at the object level, which makes these algorithms classspecific, and even sensitive to the pose of the object.
We introduce meshlets, small patches of mesh that we use to learn local shape priors.
Meshlets act as a dictionary of local features and thus allow to use learned priors to reconstruct object meshes in any pose and from unseen classes, even when the noise is large and the samples sparse.
1 Introduction
The ability to capture, represent, and digitally manipulate objects is crucial for a wide range of important applications, from content creation to animation, robotics, and virtual reality. Among the different representations for 3D objects (which also include depth maps, occupancy grids, and point clouds), meshes are particularly appealing.
Estimating meshes of realworld objects, however, is not straightforward since common capture strategies, such as structured light [30] or multiview stereo [14, 36], produce point clouds or depth maps instead. These intermediate representations are noisy, sparse, and, when used to estimate a continuous surface, they introduce a tradeoff between overfitting to the noise and oversmoothing.
Traditional methods require handcrafted priors (\eg, local smoothness) to balance noise and details, as is the case for Laplacian reconstruction [27]. Figure 1 shows that this balance is difficult to strike when the point cloud is noisy (rows marked as ). Recent learningbased methods learn priors directly from a large number of examples [16, 34, 15, 26, 29]. Because of their ability to learn priors directly from the data, these approaches can produce impressive results from both point clouds and a single image. However, they learn priors at the object level, which limits their ability to reconstruct objects from classes not seen during training (rows marked as ). They also struggle to disentangle the shape priors with the object pose: stateoftheart learning methods can fail completely on a rotated point cloud (rows marked as ) even though they can reconstruct the same point cloud when its pose resembles that of the training set (rows marked as P). Tatarchenko \etalgo further and suggest that many of these methods may actually learn a form of classification and nearestneighbor retrieval from the dataset, rather than a proper 3D reconstruction [33]. In fact, even for rows P and T in Figure 1, a closer look seems to indicate that AtlasNet [11] and OccNet [26] are reconstructing a different couch and chair from the training set.
We present a learningbased method to extract a 3D mesh from a set of sparse, noisy, unordered points that bridges traditional and learningbased approaches. Our key intuition is to learn geometric priors locally while enforcing their consistency globally. To represent shape priors, we introduce meshlets, small patches of mesh that, loosely speaking, serve as a learned dictionary of local features.
Specifically, we use a variational autoencoder (VAE) [20] to learn the latent space of meshlets that can be observed in natural shapes. We call these natural meshlets. Learning these local features offers two key advantages. First, it allows to reconstruct objects from classes never seen in training: at some scale, a couch exhibits similar local features to those of a bunny.
Second, it disentangles the global pose of the object and the parametrization learned by the network, which allows our algorithm to be robust to dramatic changes of the object’s pose, see Figure 1.
To fit the meshlets to a point cloud we minimize their distance to the points, while enforcing that they belong to the latent space of natural meshlets. Therefore, the resulting surface will locally satisfy the priors we learned. However, because the meshlets are optimized independently of each other, the mesh extracted from their union will not be watertight.
Therefore, we define an auxiliary, watertight mesh and propose to use it in an alternating optimization that ensures that the meshlets are consistent with each other and with the observed point cloud.
We show with extensive comparisons that this iterative method produces results that outperform the stateoftheart on challenging scenarios such as noisy points clouds in arbitrary poses, as shown in Figure 1. In summary, our contributions are:

We present meshlets, a new way of representing local shape priors in the latent space of a variational autoencoder that is trained on local patches from a dataset of realworld objects.

We propose an alternating optimization which fits meshlets to the measured point samples (enforcing local constraints) while maintaining global consistency for the mesh.

We demonstrate for the first time, to our knowledge, successful reconstructions of 3D meshes from very sparse, noisy point measurements with a categoryagnostic, learningbased method.
2 Related Work
Extracting a mesh from a point cloud is an important problem that has been the focus of much research since the early days of graphics. Traditional methods such as marching cubes [24] or BallPivoting [4] work well for cases where the noise is small as compared to the density of the point cloud.
In general, however, noise does pose issues. One traditional solution, then, is to use the points and the relative normals to compute a signeddistance function whose zero crossing is the desired surface [8, 13, 2, 17, 18]. An alternative is to use handcrafted priors, such as smoothness of the vertices and normals of the estimated mesh [27]. However, these priors introduce a tradeoff between suppressing the noise and preserving sharp features that becomes increasingly brittle for sparser and noisier point clouds, see Figure 1.
Priors can be more effectively learned from data with neural networks. Deep learning methods, for instance, have shown great success in estimating depth maps from images, whether from multiple views [14, 36], stereo [19], or even singleimage [9, 10, 39, 21]. Even meshes can be directly extracted from a single image, provided that the class of the object is known [16, 34, 15].
Rather than requiring to manually tinker with the traditional noise/sharpness tradeoff, methods that learn priors to extract meshes from point clouds introduce a new one: generally speaking, the lower the quality of the observations (\eg, strong noise or sparsity of the point cloud), the stronger the priors need to be, thus affecting the algorithm’s ability to generalize to different and unseen classes. For instance, methods that learn local priors are classagnostic but tend to need dense point clouds with low levels of noise [12, 38, 37, 35].
The recent works or Park \etal [29], Groueix \etal [11], and Mesheder \etal [26] produce impressive results even with sparser and potentially noisier data, but fail to generalize to completely new classes of objects. Often they even struggle when the point cloud is in a pose that differs significantly from the training pose, as show in Figure 1, rows . This issue is due, in part, to the fact that these methods lack a mechanism to enforce geometric constraints at inference time. Our method is classagnostic thanks to its ability to learn and enforce local priors while minimizing the error with respect to the point cloud at inference time.
The idea of learning priors from data and enforcing geometric constraints at inference time was recently explored for depth map [5], point cloud [40] and surface estimation [23, 22, 29]. These approaches use low dimensional representations that allow inference time optimization.
However, approaches that learn priors at the object level [40, 23, 22, 29] and, thus, tend to be category specific. Our meshlet priors directly encode the (local) shape of the surface instead of a viewer centric depth [5]. Meshlets are class agnostic and can be used to learn and enforce priors at different scales.
Key to solving the mesh estimation problem is how to represent it. Different representations for meshes exist that are amenable to use with neural networks, but they also tend to be class specific [31, 3]. One key ingredient of our method is the use of small mesh patches, called meshlets, which simplify the processing of the mesh, among other things. A related approach is the work of Groueix \etalwho also represent the mesh as a collection of large parts, which they call charts [11]. However, their method does not offer a mechanism to enforce global consistency and does not leverage local shape priors.
3 Method
Our goal is to estimate a mesh from a set of unordered, nonoriented points. The task is easy when the point cloud is dense and the noise is low. However, when the quality of the observations degrades, \eg, sparser or noisier points, the choice of priors and heuristics becomes central. Handcrafted priors, such as smoothness, introduce a tradeoff between overly smooth and noisy reconstructions, as shown in Figure 1 (low/high). On the other hand, neural networks can learn priors directly from data, but they introduce other challenges. First, capturing the distribution of generic objects requires training on a large number of examples, possibly larger than what existing datasets can supply. Moreover, generalization can be an issue: the performance of existing learningbased methods quickly degrades when the test objects differ from the ones used in training, as shown in the rows marked as in Figure 1.
Finally, it is not straightforward to disentangle objectlevel priors and the pose of the object. Figure 1 shows that OccNet [26] and AtlasNet [11], both recent stateoftheart works, fail for classes never seen in training , or even when the pose of the object is significantly different from the poses seen in training .
To overcome these issues, we propose to learn priors locally: even if the Stanford Bunny in Figure 1 was never seen in training, its local features are similar to those found in more common objects from the training set. We introduce meshlets, which can be regarded as small patches of a mesh, see Figure 17. Loosely speaking, meshlets act as a dictionary of basic shape features. Meshlets are local and of limited size, and thus offer a simple mechanism to disentangle the (local) priors from the object’s pose. If meshlets are adapted to the point cloud independently of each other, however, they may not result in a watertight surface. Therefore, we explicitly enforce their consistency globally. In the following we describe these two stages, and the overall process to extract a mesh.
3.1 Local Shape Priors with Meshlets
In this section we introduce meshlets, and describe how we leverage them to enforce local shape priors. Intuitively, a meshlet is a small patch of mesh deformed to adhere to a region of another, larger mesh, see Figure 17 and 90(a).
To extract meshlet at vertex of mesh , we first compute a local geodesic parametrization [25] that maps the 3D coordinates of the vertices in a neighborhood of to coordinates on , the plane tangent to at . We then resample the geodesic distance function at integer coordinates . This gives us the correspondence between a vertex on the meshlet at , and a vertex on the mesh in the neighborhood of .
Because they only require a local parametrization computed with respect to the center vertex , meshlets work well even for objects with large, varying curvature. Because they are local, they can learn shape priors that are independent of the pose and class of the object.
Learning local shape priors with meshlets. We want to learn the distribution of “natural meshlets,” \ie, those meshlets that capture the local features of realworld objects. Inspired by recent methods [40, 5], we use a variational autoencoder (VAE). By training the VAE to reconstruct a large number of meshlets, we force its bottleneck to learn the latent space of natural meshlets. Differently put, vectors sampled on this manifold and fed into the decoder result in natural meshlets. We extract meshlets from objects from the ShapeNet dataset [6] and we feed their 3D coordinates for training. However, we first rototranslate the meshlets to bring them into a canonical pose. This transformation is necessary to make sure that similar meshlets sampled from different 3D locations and orientations map to similar regions in the VAE’s latent space. More specifically, given a meshlet , we first translate and rotate it so that its center is at the origin, and the normal at is aligned with the zaxis, then we rotate it around the axis so that the local coordinates of the meshlet are aligned with the and axes. We call this the canonical pose. A meshlet, then, is completely defined by , the transformation from global to canonical pose, and , the latent vector corresponding to the meshlet in canonical pose:
(1) 
Section 4.2 details the network’s architecture. Since we disentangle pose and shape, smoothly traversing the latent space will smoothly vary the shape of the reconstructed meshlet as shown in Figure 17, where we take the latent vectors and corresponding to meshlets A and B, and we progressively interpolate between them to get vectors ’s. The meshlets reconstructed from the ’s smoothly interpolate between the shape of meshlets A and B. (Please use a mediaenabled PDF viewer, such as Adobe Reader, to view the animations in Figure 17.)
Fitting a meshlet to 3D points. Assume now that we are given a set of 3D points roughly corresponding to the size of a meshlet (we will generalize this to a complete point cloud in Section 3.2). Deforming a natural meshlet to fit it is now straightforward: we simply traverse the latent space learned by the VAE to minimize the distance between the meshlet and the points. Specifically, we take , an initialization of the meshlet, and run it through the encoder to find the corresponding latent vector . This is the starting point of our optimization. We then freeze the weights of the VAE, compute the error between the meshlet and the points, and take a gradient descent step through the decoder. This brings us to a new point in latent space, , and the corresponding meshlet . Meshlet is a natural meshlet that is closer to the given 3D points. We iterate until convergence, see Figure 18. We note that, although other approaches have also proposed to optimize the latent vector of a VAE to match some measured samples (\eg, [29, 5, 1, 23]), they do so at the object (or scene) level. Because our method learns local surface patches, and therefore reuse surface priors across different object categories, it can better generalize.
3.2 Overall Optimization
Having explained how our meshlets can be used to learn local priors, and can be fit to a set of 3D points, we can describe the overall algorithm, which is fairly straightforward at its core. We start with , an initial, rough approximation of the complete mesh. This could be a sphere, or any other surface that satisfies our meshlet priors, \ie, meshlets extracted from lie on the manifold learned by the VAE. From we extract overlapping meshlets ’s and find the corresponding ’s and ’s, Figure 20. We select so that each vertex on is covered by at least meshlets. Generally, this results in to meshlets. We also find the distance between and the point cloud, whose gradient we can propagate to the meshlets since we have the correspondences between mesh and meshlets by construction. This allows us to update the meshlets to adapt to the points (Section 3.1).
However, this optimization is performed on each meshlet independently, so it results in small gaps between the meshlets, see Figure 19(a). Therefore, we enforce and maintain global consistency by adding a step in which we deform to match the meshlets and update the meshlets to match . Deforming brings it closer to the point cloud, deforming the meshlets forces them to be globally consistent. Finally we iterate:

Optimize meshlets to fit the point cloud (3.2.1).

Optimize meshlets and mesh to match each other (3.2.2).
At convergence, the auxiliary variable , watertight by construction, is our estimation of the mesh. We now explain the two steps in detail.
(a)  (b) 
Enforcing Local Shape Priors
To optimize the meshlets with respect to the point cloud we need to define an error. Unfortunately, the correspondences between the point cloud and the vertices of the meshlets are not readily available. A Chamfer distance, then, is not straightforward to use because without correspondences all the points in the point cloud would contribute to the error of all the meshlets—even if they are on opposite sides of the object. However, we do have the correspondences between the vertices of and the meshlets. Therefore, we compute the Chamfer distance between the point cloud and the mesh instead:
(2) 
where is a 3D point in the input point cloud, PC. Equation 2 gives us pervertex error on the mesh, which we can propagate to the corresponding meshlets. We then update the meshlets to minimize as explained in Section 3.1, and get a new set of natural meshlets .
Enforcing Global Consistency
To enforce that meshlets are globally consistent, \ie, that their union is a watertight mesh, we use, once again, . Specifically, we compute the Chamfer distance between the vertices of and the vertices of all the meshlets as:
(3)  
First we keep the meshlets fixed and deform to minimize . Then we fix the resulting mesh and adjust the meshlets with the algorithm described in Section 3.1, but this time to minimize . We iterate until Equation 3 is minimized. At this point the meshlets will be consistent with the mesh and, in turn, globally. This process corresponds to the block “global consistency” in Figure 19.
4 Implementation Details
4.1 Optimization
We start by describing a few details that improve the efficiency of the optimization procedure or the quality of the resulting meshes.
Mesh initialization. The auxiliary mesh can be any genuszero mesh that satisfies the meshlets’ priors (see Section 3.2). The actual choice, however, does have a bearing on the number of iterations required to converge. We initialize our approach with an overlysmoothed Laplacian reconstruction. Empirically, we have observed that the results of our algorithm initialized in this way are effectively indistinguishable from the results obtained by using a sphere as an initialization; convergence, however, does take a fraction of the time. For reference, we show a few examples of in the Supplementary.
Meshlets resampling. As the optimization progresses, the shape and the size of the auxiliary mesh may change significantly. On the one hand, this is a desirable behavior: if the mesh can scale to arbitrary sizes, it can properly match the size of the underlying mesh, even when the initialization is far from it. On the other hand, it results in a sparser meshlet coverage and, potentially, no coverage in some areas. Moreover, it could cause meshlets to be overlystretched. Therefore, every 20 iterations of enforcing local shape priors and global consistency (blue arrow in Figure 19), we resample the meshlets on the current mesh.
Remeshing. Large changes from the initialization may also cause issues to the mesh itself, which may stretch in some regions or become otherwise irregular. One way to prevent this is to use strong smoothness priors when enforcing global consistency, but that would hinder our ability to reconstruct sharp features. At the end of every iteration, we remesh using Screened Poisson Reconstruction [18] to encourage smoothness while respecting the priors enforced by our approach, \ie, preserving the sharpness of local features. We provide more details in the Supplementary.
4.2 Meshlet training
To train the meshlets network, we sample meshlets from the ShapeNet dataset [6]. We extract meshlets by randomly selecting objects across several classes. We then apply three different scales to each object and extract 256 meshlets for each scale, so that our meshlet dataset captures both fine and coarse details. Note that we disregard meshlets that are problematic. Specifically, we use the geodesic distance algorithm by Melvær \etal [25] and reject those meshlets for which the geodesic distance calculation results in a large anisotropic stretch, or fails altogether. The network, then, is trained to reconstruct these meshlets using as a loss. In all of our experiments we use meshlets of size . To exploit the latent space of natural meshlets, we use a fullyconnected encoder decoder network that takes as input a vector (\ie, a vectorized version of the meshlet). The encoder and the decoder are symmetric with 6 layers each, and the latent code vector is one third of the input dimension.
5 Experiments
Comparisons.
In this section we evaluate our method against stateoftheart approaches. Then, we compare our meshlets to other local shape priors to validate their importance.
We compare our method with several stateoftheart mesh reconstruction approaches. The first is Screened Poisson [18], a widely used, traditional technique that creates watertight surfaces from oriented point clouds. Because our input is a raw point cloud, then, we need to estimate normals. We use two methods to estimate normals. One is MeshLab’s normal estimation, which fits local planes and uses them to estimate normals [7]. The other is a recently published, learningbased method called PCPNet [12]. PCPNet estimates geometric properties such as normals and curvature from local regions on the point clouds. The second mesh reconstruction approach is the method by Öztireli \etal [28], which also requires oriented points. They propose a variant of marching cubes that preserves sharp features using nonlinear regression. In addition, we compare against Laplacian mesh optimization [27]. Leveraging the fact that the norm of the mesh’s Laplacian captures the local mean curvature, this mesh optimization algorithm optimizes the Laplacian at the vertices in a weighted leastsquare sense. The algorithm has a free parameter that regulates the smoothness of the resulting surface. After a parameter sweep we found that no single parameter would yield the best results over the whole dataset. Therefore we settled for two values, each offering a different compromise between denoising and oversmoothing.
The data. We test all the methods on 20 objects. To validate that our method generalizes well, we also include four objects that are commonly used by the graphics community (Suzanne, the Stanford Bunny, Armadillo, and the Utah Teapot). We select the rest of the meshes from the test set of ShapeNet dataset [6]. We show all the objects in the Supplementary. However, because the ShapeNet meshes are not always watertight, we preprocess them with a simple algorithm that we describe in the Supplementary. Given the watertight meshes we randomly decimate the number of vertices by different factors, obtaining three different sparsity levels. For each sparsity level we also add an increasingly large amount of Gaussian noise.
We describe the parameters we use, and offer visualization of the different levels of noise in the Supplementary.
Numerical evaluation. For our numerical evaluation we use the symmetric Hausdorff distance, which reports the largest vertex reconstruction error for each mesh, and the Chamfer distance, which computes the distance between two meshes after assigning correspondences based on closest vertices. Table 1 shows that our method performs consistently better than all the competitors. The gap is most apparent when comparing with deeplearning methods that learn priors at the object level, further suggesting that our strategy to learn local priors is a promising direction. We list the numbers for each object across the different noise settings in the Supplementary.
Chamfer  Hausdorff  
Meshlab[7] +  Scr.Pois.[18]  0.0285 / 0.0112  0.339 / 0.102 
RILMS [28]  0.0177 / 0.0166  0.149 / 0.148  
PCPNet[12] +  Scr.Pois.[18]  0.0122 / 0.0109  0.147 / 0.140 
RILMS [28]  0.0181 / 0.0176  0.151 / 0.153  
Laplacian [27]  Low  0.0104 / 0.0103  0.100 / 0.065 
High  0.0096 / 0.0094  0.103 / 0.069  
Deep Geometric Prior [35]  0.0128 / 0.0130  0.147 / 0.148  
AtlasNet [11]  0.0415 / 0.0377  0.293 / 0.263  
OccNet [26]  0.0630 / 0.0627  0.304 / 0.285  
Ours  0.0090 / 0.0092  0.054 / 0.047 
Qualitative evaluation.


In Figure 90 we show both the meshlets at the end of our optimization (a) and the quality of the final mesh reconstructed by our algorithm (b). Note that, thanks to the remeshing steps during our optimization procedure (Section 4.1), our output is a highquality, regular mesh. We also show a subset of the objects used in the numerical evaluation in Figures 1 and 87 for different levels of sparsity and noise. Additional results are in the Supplementary. Competing methods are significantly impacted by noise and produce overly smooth results to attenuate its effect. For example, the Laplacian reconstructions obtained with low regularization (Laplow) are still noisy, while those for which we used high regularization (Laphigh) are oversmoothed. Even Screened Poisson reconstruction [18], the de facto standard among traditional methods, used in conjunction with PCPNet [12] to estimate the normals, produces visibly noisy results. Finally, as also shown in Figure 1, stateoftheart deep learning methods only work on objects seen in training and for low levels of noise. Our results, on the other hand, offer the best tradeoff between detail and noise by recovering locally sharp features and small details, despite the sparsity and noise of the point clouds.
On the importance of natural meshlets.
Our meshlet priors are the core of our reconstruction. Here we compare our natural meshlets with other shape priors to isolate their contribution to the overall quality of the result. The first is a Laplacian regularizer, which is a standard smoothness prior [27]. The second, is the recent work by Williams \etal, which suggests that a neural network is also, in itself, a prior for local geometry [35]. We use these priors and our natural meshlet prior to optimize small patches of mesh to small point clouds extracted from real objects.
Figure 99 shows two representative examples. Despite the complexity of the local shape, and the level of noise, the optimization that uses our strategy (Section 3.1) is able to correctly estimate the underlying meshlet. On the contrary, Deep Geometric Prior overfits to the noise, and the Laplacian regularizer oversmooths the surface. On meshlets, the average symmetric Hausdorff distance is for DGP, for Laplacian, for our method.
6 Discussion and Limitations
Our approach optimizes a mesh and a number of meshlets based on the gradients available at mesh vertexes, while enforcing the meshlets priors. In this paper, the gradients for the mesh were obtained by computing the distance of the mesh to the point cloud. However, our method can take gradients from any source, including a differentiable renderer [16]. This adds to the flexibility of our approach
In our work we learn and enforce priors for mesh estimation using meshlets, which have an intrinsic scale and resolution. Our current approach uses a single fixed scale of the meshlet for all the object reconstructions, although we effectively learn meshlets at multiple scales (see Section 4.2). This poses limitations on the level of details we can reconstruct: they cannot be smaller than the resolution of the meshlet. Using a meshlet at a single scale throughout the mesh deformation process may also lead to local minima. A natural extension, then, would be to use a coarsetofine approach.
Our current approach is computationally expensive and not optimized for speed. Hence, it can take from hours to dozens of hours, depending on the initialization to run the full optimization. Several acceleration techniques for finding correspondences between mesh and meshlets exist that would help. Improving the efficiency of the meshlets extraction would help.
7 Conclusions
We have presented a novel geometrical representation, meshlets, which allows us to robustly reconstruct 3D meshes from sparse, noisy point samples. By training a variational autoencoder to learn the lowdimensional manifold of natural local surface patches, meshlets provide us with a strong prior that can be used to properly reconstruct geometry from sparse samples. However, because meshlets are localized representations, optimizing them independently would result in an inconsistent surface. Therefore, we propose an alternating optimization which first optimizes the meshlets to match the point samples and then enforces consistency across all of them to match the global shape of the reconstructed mesh. The resulting algorithm is able to reconstruct surfaces from very sparse and noisy samples more reliably than stateoftheart approaches.
Acknowledgments
We thank Kihwan Kim, Alejandro Troccoli, and Ben Eckart for helpful discussions about evaluation. We also thank Arash Vahdat for valuable discussions on VAE training. Abhishek was supported by the NVIDIA Fellowship. This work was partially funded by National Science Foundation grant #IIS1619376.
1 Additional Experimentation Details
In this section we give additional details about the experiments shown in the main paper.
1.1 Generating Watertight Meshes
We used meshlets extracted from the ShapeNet [6] dataset for training. The test dataset for mesh reconstruction was formed by selecting objects from the test set of ShapeNet dataset as well as few objects from outside the ShapeNet. However, most ShapeNet objects are not watertight. To generate watertight meshes for our objects we used the process described by Stutz and Geiger [32]. First, the object is scaled to lie in . Next, depth maps are rendered from 200 views and with a resolution of . These depth maps are used to perform TSDF fusion. We use volume for TSDF fusion. Finally a mesh simplification step is performed using meshlab to give us a final mesh with 50k vertices. These vertices are roughly uniform over the surface of the mesh.
1.2 Noise, Sparsity and Outliers Parameters used in Experiments
To test different approaches we designed three different settings for noise and sparsity.
Given a GT object with 50k vertices, we first randomly sample a of the vertices to get a sparse point cloud. Following this we add a Gaussian noise of magnitude to each point in the sparse point cloud.
The three different settings used for the experiments are as follows:

Setting 1 (S1): and

Setting 2 (S2): and

Setting 3 (S3): and
To appreciate the level of noise we provide a visualization of these noise settings in Figure 40.
1.3 Test dataset and Initialization
In Figure 21 we show all the GT objects used for the mesh reconstruction evaluation. We also show the initial mesh, . Note that we used the exact same to initialize both our method and the Laplacian mesh optimization.




















2 Additional Results
Figure 162 shows additional qualitative comparisons between different approaches.
We provide the numbers for each object across the different noise settings in Table 1 (Symmetric Hausdorff) and Table 2 (Chamfer).
obj_name  Ours  DGP  Laplow  Laphigh  [7]+[18]  [7]+[28]  [12]+[18]  [12]+[28]  OccNet  AtlasNet 
clock_1_S3  3.653  17.457  11.052  11.005  122.523  15.532  11.359  15.511  29.283  24.631 
clock_1_S2  3.536  13.955  10.953  10.891  6.235  13.111  11.333  11.858  25.27  21.75 
clock_1_S1  2.853  13.349  11.419  11.356  3.971  10.19  11.778  15.908  25.274  21.229 
chair_1_S3  5.434  16.978  32.646  33.899  115.056  17.746  14.692  16.042  22.796  22.476 
chair_1_S2  3.154  12.676  30.231  32.58  112.8  13.675  14.424  13.751  12.777  19.855 
chair_1_S1  4.204  10.124  29.066  32.861  4.012  12.236  14.455  12.642  12.651  17.271 
sofa_4_S3  5.327  17.874  5.597  6.561  39.052  18.864  24.706  18.801  65.388  35.434 
sofa_4_S2  6.13  15.129  4.933  6.111  16.215  14.744  26.939  16.667  33.341  33.299 
sofa_4_S1  12.935  13.947  9.957  11.439  11.61  15.108  31.597  19.499  27.108  32.914 
sofa_2_S3  5.552  18.691  8.772  12.991  166.4  17.276  20.451  15.399  26.107  54.051 
sofa_2_S2  6.69  14.925  5.316  7.299  31.524  14.784  20.95  14.418  22.708  49.475 
sofa_2_S1  3.857  10.758  5.074  6.611  13.18  13.112  24.549  14.801  16.465  40.677 
sofa_3_S3  4.272  17.02  6.688  6.387  12.701  18.736  19.288  15.456  23.524  21.608 
sofa_3_S2  4.567  16.181  6.169  5.986  6.57  15.084  20.977  14.355  23.436  22.014 
sofa_3_S1  3.413  11.27  4.766  4.185  4.412  13.232  23.584  14.913  16.549  14.023 
sofa_1_S3  4.513  16.788  4.485  4.01  10.122  17.564  27.773  16.158  22.573  23.399 
sofa_1_S2  5.796  13.385  3.687  4.92  4.658  13.339  27.74  14.893  22.384  22.026 
sofa_1_S1  5.474  10.798  5.449  5.23  5.2  15.536  30.814  18.724  7.518  20.873 
bench_1_S3  5.142  16.187  5.887  4.674  153.809  16.129  17.476  15.43  30.692  23.743 
bench_1_S2  4.551  14.132  3.899  3.906  112.483  13.554  18.607  12.978  26.192  17.842 
bench_1_S1  3.543  10.362  4.36  4.379  4.493  10.929  20.693  12.559  20.622  15.442 
guitar_1_S3  7.381  14.657  10.117  10.177  163.681  16.889  7.67  15.33  20.392  25.868 
guitar_1_S2  6.18  11.316  7.614  7.65  137.846  12.287  6.419  10.392  19.465  25.338 
guitar_1_S1  3.011  11.2  7.332  7.415  9.681  9.027  5.831  8.532  18.356  26.169 
suzanne_S3  5.719  16.841  6.748  7.252  25.533  16.781  6.49  15.598  34.363  34.366 
suzanne_S2  4.141  14.976  6.327  6.516  5.124  15.299  7.623  14.374  32.841  27.383 
suzanne_S1  4.797  10.417  5.473  5.442  4.721  9.305  7.051  14.159  30.643  23.145 
teapot_S3  6.152  17.25  11.014  11.161  118.128  18.79  11.768  15.414  26.124  25.139 
teapot_S2  7.314  15.836  10.832  11.473  17.74  15.421  11.692  13.771  23.316  22.191 
teapot_S1  7.58  11.849  10.854  11.121  6.713  10.7  11.139  13.41  24.015  18.796 
armadillo_S3  5.812  19.349  23.069  23.48  117.21  18.891  10.438  17.449  41.609  37.975 
armadillo_S2  4.247  16.592  23.446  23.703  8.058  14.834  12.623  12.515  40.507  36.749 
armadillo_S1  4.342  12.021  22.962  24.023  6.943  16  14.953  14.678  39.251  35.446 
bunny_S3  3.917  25.159  19.27  19.099  25.02  20.017  20.489  17.757  75.799  50.199 
bunny_S2  6.974  18.543  19.11  19.198  8.336  14.737  21.744  18.383  66.464  47.761 
bunny_S1  3.967  13.767  19.421  19.429  5.853  13.703  23.577  14.289  60.246  52.51 
mobile_1_S3  4.189  16.22  4.63  3.216  82.184  18.032  5.012  15.388  29.668  18.336 
mobile_1_S2  3.579  13.233  3.94  3.212  4.531  14.088  3.509  12.554  28.743  16.21 
mobile_1_S1  2.869  9.153  2.979  2.484  3.201  10.468  5.165  10.361  28.298  16.223 
speaker_1_S3  5.033  18.527  8.624  8.327  17.667  19.866  14.627  16.993  34.595  48.009 
speaker_1_S2  5.141  14.938  7.281  7.328  21.282  15.363  14.223  14.691  17.407  47.535 
speaker_1_S1  4.737  14.208  5.145  4.72  5.276  13.521  12.756  12.722  9.65  47.674 
mailbox_1_S3  10.031  16.761  6.169  5.413  38.719  20.275  5.141  15.046  35.726  27.246 
mailbox_1_S2  10.368  12.95  4.312  4.874  9.469  12.663  4.967  12.598  30.253  26.995 
mailbox_1_S1  9.868  11.426  4.425  6.325  4.245  8.704  6.586  8.777  26.279  27.183 
camera_1_S3  25.875  19.303  5.048  5.05  29.712  16.55  5.022  16.955  33.868  39.07 
camera_1_S2  5.622  12.472  5.091  5.331  6.308  14.748  9.199  15.082  35.641  35.475 
camera_1_S1  4.787  7.959  4.943  4.081  4.349  13.25  14.537  17.113  26.015  30.47 
table_1_S3  2.944  21.648  4.165  2.996  30.838  18.05  14.61  16.602  46.057  26.448 
table_1_S2  3.182  16.677  3.886  3.124  6.728  15.498  13.449  16.89  42.707  17.832 
table_1_S1  3.185  8.448  3.563  3.369  3.934  12.077  13.865  16.576  40.015  16.728 
table_2_S3  4.791  18.325  21.119  21.256  47.586  20.083  24.451  20.052  39.425  40.386 
table_2_S2  4.568  15.504  21.173  21.174  28.283  13.954  21.651  15.999  38.058  43.241 
table_2_S1  5.523  9.842  20.715  21.051  12.037  13.089  24.821  18.586  37.659  38.992 
monitor_1_S3  4.985  19.937  4.791  5.151  27.943  18.517  8.319  16.546  32.03  26.32 
monitor_1_S2  5.325  16.997  5.748  5.66  5.546  16.31  7.464  15.472  14.612  21.924 
monitor_1_S1  5.036  8.414  5.448  5.518  5.156  13.91  5.522  15.862  9.04  21.262 
lamp_1_S3  3.934  18.594  11.321  11.347  10.307  18.322  9.497  15.259  43.916  29.247 
lamp_1_S2  3.112  13.952  10.25  10.115  4.465  13.1  11.141  14.075  42.604  27.942 
lamp_1_S1  4.107  12.046  9.68  9.792  4.88  11.655  15.259  17.152  41.499  27.995 
mean  5.482  14.655  9.974  10.255  33.871  14.921  14.741  15.069  30.497  29.363 
median  4.789  14.791  6.507  6.931  10.2145  14.809  14.044  15.359  28.520  26.384 
obj_name  Ours  DGP  Laplow  Laphigh  [7]+[18]  [7]+[28]  [12]+[18]  [12]+[28]  OccNet  AtlasNet 
clock_1_S3  0.792  1.658  1.199  1.071  9.294  2.562  1.265  2.378  8.748  4.26 
clock_1_S2  0.759  1.266  1.019  0.962  1.008  1.781  1.252  1.6  5.506  3.823 
clock_1_S1  0.735  0.908  0.931  0.943  0.827  0.995  1.301  1.374  3.951  3.576 
chair_1_S3  0.873  1.589  1.431  1.448  7.374  2.635  1.432  2.644  7.326  3.052 
chair_1_S2  0.761  1.204  1.21  1.392  9.841  1.871  1.247  1.821  5.169  2.358 
chair_1_S1  0.747  0.904  0.989  1.157  0.846  1.104  1.271  1.524  3.008  1.897 
sofa_4_S3  1.14  1.669  1.188  1.166  2.118  2.633  1.54  2.253  9.094  7.69 
sofa_4_S2  1.197  1.32  1.13  1.163  1.226  1.567  1.757  1.883  6.557  7.39 
sofa_4_S1  1.361  1.02  1.127  1.171  1.112  1.148  2.591  2.007  3.453  7.178 
sofa_2_S3  0.985  1.631  1.24  1.26  6.973  2.524  1.359  2.398  7.678  7.688 
sofa_2_S2  0.967  1.228  1.044  1.082  2.205  1.701  1.379  1.699  5.169  6.886 
sofa_2_S1  0.952  0.956  0.997  1.043  1.257  1.213  1.785  1.727  3.256  6.304 
sofa_3_S3  0.881  1.714  1.079  0.823  1.491  2.567  1.381  2.409  6.947  3.619 
sofa_3_S2  0.832  1.272  0.922  0.792  0.939  1.586  1.333  1.693  5.558  2.655 
sofa_3_S1  0.786  0.936  0.864  0.759  0.776  0.957  1.583  1.706  2.555  2.029 
sofa_1_S3  0.951  1.642  1.08  0.95  1.369  2.551  1.672  2.356  7.21  3.583 
sofa_1_S2  0.976  1.301  0.988  0.901  0.976  1.513  1.714  1.812  5.508  2.775 
sofa_1_S1  0.937  0.991  0.925  0.869  0.891  1.031  2.07  1.938  2.602  2.384 
bench_1_S3  0.938  1.547  1.199  0.953  11.713  2.485  1.476  2.502  7.794  3.876 
bench_1_S2  0.793  1.202  0.916  0.809  9.116  1.827  1.514  1.788  5.702  2.697 
bench_1_S1  0.748  0.947  0.834  0.794  0.814  1.087  1.867  1.496  3.524  1.961 
guitar_1_S3  0.973  1.487  1.77  1.455  18.503  2.51  1.229  3.014  9.138  4.396 
guitar_1_S2  0.657  1.161  1.046  0.837  12.261  2.004  0.8  1.945  6.986  3.166 
guitar_1_S1  0.475  0.859  0.712  0.597  0.869  1.26  0.642  1.011  4.413  2.38 
suzanne_S3  1.013  1.684  1.123  0.93  2.467  2.551  0.955  2.217  7.967  4.529 
suzanne_S2  0.906  1.247  0.95  0.894  0.937  1.597  0.873  1.399  6.573  4.162 
suzanne_S1  0.954  0.933  0.899  0.873  0.756  0.916  0.916  1.21  5.825  3.991 
teapot_S3  0.827  1.72  1.184  0.841  9.856  2.694  1.008  2.318  6.316  3.777 
teapot_S2  0.739  1.238  0.882  0.744  1.118  1.832  0.869  1.465  5.265  3.336 
teapot_S1  0.736  0.912  0.818  0.73  0.711  0.977  0.876  1.206  4.66  2.902 
armadillo_S3  1.033  1.653  1.301  1.247  7.371  2.778  1.062  2.456  9.835  6.23 
armadillo_S2  0.908  1.208  1.071  1.132  0.999  1.735  0.972  1.604  8.848  5.931 
armadillo_S1  0.962  0.914  1.007  1.1  0.809  1.046  1.082  1.44  7.576  5.639 
bunny_S3  1.037  1.716  1.133  1.087  1.852  2.569  1.312  2.261  13.601  7.731 
bunny_S2  1.02  1.277  1.033  1.05  0.962  1.453  1.283  1.529  12.126  7.482 
bunny_S1  1.044  0.976  0.989  1.048  0.839  0.965  1.434  1.392  11.111  7.348 
mobile_1_S3  0.778  1.72  1.123  0.795  6.66  2.835  0.94  2.332  7.237  4.035 
mobile_1_S2  0.686  1.264  0.91  0.71  0.938  1.731  0.772  1.313  5.552  2.621 
mobile_1_S1  0.672  0.897  0.781  0.656  0.738  0.925  0.717  0.903  4.136  1.832 
speaker_1_S3  0.966  1.659  1.126  0.945  1.686  2.706  1.08  2.256  8.276  3.621 
speaker_1_S2  0.92  1.244  0.987  0.887  1.381  1.692  1.024  1.515  4.767  3.328 
speaker_1_S1  0.925  0.932  0.919  0.868  0.873  1.077  0.986  1.259  3.059  3.476 
mailbox_1_S3  0.976  1.654  1.437  1.003  3.53  2.505  1.029  2.615  7.939  4.331 
mailbox_1_S2  0.732  1.201  0.944  0.733  1.122  1.908  0.78  1.665  6.871  3.403 
mailbox_1_S1  0.68  0.805  0.732  0.648  0.702  0.978  0.695  0.935  5.187  2.928 
camera_1_S3  1.021  1.646  1.072  0.955  2.528  2.495  0.984  2.057  6.927  4.052 
camera_1_S2  1.075  1.252  1.027  0.953  1.008  1.62  0.979  1.544  6.225  3.554 
camera_1_S1  0.999  0.954  0.954  0.922  0.93  1.058  1.29  1.786  4.895  3.425 
table_1_S3  0.825  1.697  1.032  0.792  1.758  2.53  1.092  2.191  9.56  3.899 
table_1_S2  0.788  1.266  0.908  0.755  0.912  1.506  1.009  1.448  6.487  3.719 
table_1_S1  0.803  0.949  0.865  0.745  0.767  0.893  1.018  1.286  3.635  3.757 
table_2_S3  0.934  1.66  1.202  1.124  3.284  2.735  1.539  2.482  8.435  5.142 
table_2_S2  0.925  1.279  1.1  1.08  1.665  1.622  1.474  1.91  6.948  4.359 
table_2_S1  0.926  0.952  1.023  1.036  1.058  1.05  1.832  1.959  4.537  3.88 
monitor_1_S3  1.094  1.66  1.109  1.072  2.048  2.605  1.03  2.033  6.693  4.368 
monitor_1_S2  1.179  1.273  1.061  1.067  1.018  1.537  0.977  1.386  4.236  4.261 
monitor_1_S1  1.13  0.96  1.005  1.046  0.957  1.017  1.033  1.338  2.354  4.017 
lamp_1_S3  0.8  1.742  1.06  0.86  1.361  2.544  1.019  2.089  8.63  3.667 
lamp_1_S2  0.766  1.283  0.933  0.823  0.879  1.376  0.934  1.415  5.624  3.359 
lamp_1_S1  0.772  0.969  0.88  0.8  0.728  0.891  1.004  1.441  3.028  2.998 
mean  0.896  1.280  1.040  0.956  2.850  1.768  1.222  1.811  6.297  4.145 
median  0.922  1.258  1.025  0.944  1.115  1.657  1.087  1.757  6.270  3.767 
3 Additional Algorithm Details
The pseudo code for our optimization procedure to estimate a watertight mesh while enforcing meshlets priors is explained in Algorithm LABEL:algo:meshoptim.
In our optimization procedure we update both the meshlets and the auxiliary mesh. While meshlets priors make the updates of meshlets stable, a prior is needed while updating mesh to ensure that the mesh is watertight and vertices are uniformly distributed. Use of smoothness priors or other priors for mesh would hinder our ability to reconstruct sharp features. Hence we use Screened Poisson Reconstruction [18] at the end of every iteration and use the vertices and normals of globally consistent meshlets to update the mesh.
algocf[htbp]\end@dblfloat
Footnotes
 This work was done while Abhishek Badki was interning at NVIDIA.
References
 (2018) Modeling facial geometry using compositional VAEs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §3.1.
 (1995) Automatic reconstruction of surfaces and scalar fields from 3D scans. In Proceedings of SIGGRAPH, Cited by: §2.
 (2018) Multichart generative surface modeling. ACM Transactions on Graphics. Cited by: §2.
 (1999) The ballpivoting algorithm for surface reconstruction. IEEE Transactions on Visualization and Computer Graphics (TVCG). Cited by: §2.
 (2018) CodeSLAM â learning a compact, optimisable representation for dense visual SLAM. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §2, §2, §3.1, §3.1.
 (2015) ShapeNet: An informationrich 3D model repository. Technical report Technical Report arXiv:1512.03012. Cited by: §1.1, §3.1, §4.2, §5.
 (2008) MeshLab: An opensource mesh processing tool. In Eurographics Italian Chapter Conference, Cited by: (dh)dh, (di)di, Figure 162, Table 1, Table 2, (be)be, (bf)bf, Figure 87, §5, Table 1.
 (1996) A volumetric method for building complex models from range images. In ACM Transactions on Graphics (SIGGRAPH), Cited by: §2.
 (2014) Depth map prediction from a single image using a multiscale deep network. In Advances in Neural Information Processing Systems (NIPS), Cited by: §2.
 (2017) Unsupervised monocular depth estimation with leftright consistency. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §2.
 (2018) A papiermâché approach to learning 3D surface generation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: Figure 1, §1, Figure 162, Table 1, Table 2, §2, §2, §3, Figure 87, §5, Table 1.
 (2018) PCPNet: learning local shape properties from raw point clouds. In Computer Graphics Forum, Cited by: (dj)dj, (dk)dk, Figure 162, Table 1, Table 2, §2, (bg)bg, (bh)bh, Figure 87, §5, §5, Table 1.
 (1992) Surface reconstruction from unorganized points. In Proceedings of SIGGRAPH, Cited by: §2.
 (2018) DeepMVS: Learning multiview stereopsis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §1, §2.
 (2018) Learning categoryspecific mesh reconstruction from image collections. In Proceedings of the European Conference on Computer Vision (ECCV), Cited by: §1, §2.
 (2018) Neural 3D mesh renderer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §1, §2, §6.
 (2006) Poisson surface reconstruction. In Eurographics Symposium on Geometry Processing, Cited by: §2.
 (2013) Screened Poisson surface reconstruction. ACM Transactions on Graphics. Cited by: (di)di, (dk)dk, Figure 162, Table 1, Table 2, §2, §3, §4.1, (bf)bf, (bh)bh, Figure 87, §5, §5, Table 1.
 (2017) Endtoend learning of geometry and context for deep stereo regression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §2.
 (2013) Autoencoding variational bayes. arXiv preprint arXiv:1312.6114. Cited by: §1.
 (2019) Towards robust monocular depth estimation: Mixing datasets for zeroshot crossdataset transfer. arXiv preprint arXiv:1907.01341. Cited by: §2.
 Cited by: §2, §2.
 (2018) Deformable shape completion with graph convolutional autoencoders. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §2, §2, §3.1.
 (1987) Marching cubes: A high resolution 3D surface construction algorithm. In Proceedings of SIGGRAPH, Cited by: §2.
 (2012) Geodesic polar coordinates on polygonal meshes. In Computer Graphics Forum, Cited by: §3.1, §4.2.
 (2019) Occupancy networks: Learning 3D reconstruction in function space. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: Figure 1, §1, Figure 162, Table 1, Table 2, §2, §3, Figure 87, §5, Table 1.
 (2006) Laplacian mesh optimization. In ACM International Conference on Computer Graphics and Interactive Techniques (GRAPHITE), Cited by: Figure 1, §1, Figure 162, Table 1, Table 2, §2, Figure 87, §5, §5, Table 1.
 (2009) Feature preserving point set surfaces based on nonlinear kernel regression. In Computer Graphics Forum, Cited by: (dh)dh, (dj)dj, Figure 162, Table 1, Table 2, (be)be, (bg)bg, Figure 87, §5, Table 1.
 (201906) DeepSDF: Learning continuous signed distance functions for shape representation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §1, §2, §2, §2, §3.1.
 (2003) Highaccuracy stereo depth maps using structured light. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §1.
 (2017) SurfNet: Generating 3D shape surfaces using deep residual networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §2.
 (2018) Learning 3d shape completion under weak supervision. CoRR. Cited by: §1.1.
 (2019) What do singleview 3D reconstruction networks learn?. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §1.
 (2018) Pixel2mesh: Generating 3D mesh models from single RGB images. In Proceedings of the European Conference on Computer Vision (ECCV), Cited by: §1, §2.
 (201906) Deep geometric prior for surface reconstruction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: Figure 162, Table 1, Table 2, §2, Figure 87, §5, §5, Table 1.
 (2018) MVSNet: Depth inference for unstructured multiview stereo. In Proceedings of the European Conference on Computer Vision (ECCV), Cited by: §1, §2.
 (2018) ECNet: An edgeaware point set consolidation network. In Proceedings of the European Conference on Computer Vision (ECCV), Cited by: §2.
 (2018) PUNet: point cloud upsampling network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §2.
 (2017) Unsupervised learning of depth and egomotion from video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §2.
 (2018) Objectcentric photometric bundle adjustment with deep shape prior. In IEEE Winter Conference on Applications of Computer Vision (WACV), Cited by: §2, §2, §3.1.