Meshlet Priors for 3D Mesh Reconstruction

Meshlet Priors for 3D Mesh Reconstruction

Abstract

Estimating a mesh from an unordered set of sparse, noisy 3D points is a challenging problem that requires carefully selected priors. Existing hand-crafted priors, such as smoothness regularizers, impose an undesirable trade-off between attenuating noise and preserving local detail. Recent deep-learning approaches produce impressive results by learning priors directly from the data. However, the priors are learned at the object level, which makes these algorithms class-specific, and even sensitive to the pose of the object. We introduce meshlets, small patches of mesh that we use to learn local shape priors. Meshlets act as a dictionary of local features and thus allow to use learned priors to reconstruct object meshes in any pose and from unseen classes, even when the noise is large and the samples sparse. 1

\cvprfinalcopy

1 Introduction

The ability to capture, represent, and digitally manipulate objects is crucial for a wide range of important applications, from content creation to animation, robotics, and virtual reality. Among the different representations for 3D objects (which also include depth maps, occupancy grids, and point clouds), meshes are particularly appealing.

Figure 1: Mesh reconstructions for objects in pose (P) and out of pose (

P

), under low (N) and moderate (

N

) noise, and for objects in (T) and out (

T

) of training. GT+PC ground truth mesh and the point cloud used by the various methods to estimate the mesh. Traditional methods introduce a noise vs smoothness trade-off (Laplacian low/high [27]). State-of-the-art, deep-learning methods (AtlasNet [11] and OccNet [26]) learn object-level priors, which causes them to fail on objects not seen in training (

T

), or even on objects that are just rotated w.r.t. the training set (

P

). Our method learns local priors and forces global consistency with the point cloud.

Estimating meshes of real-world objects, however, is not straightforward since common capture strategies, such as structured light [30] or multi-view stereo [14, 36], produce point clouds or depth maps instead. These intermediate representations are noisy, sparse, and, when used to estimate a continuous surface, they introduce a trade-off between over-fitting to the noise and over-smoothing.

Traditional methods require hand-crafted priors (\eg, local smoothness) to balance noise and details, as is the case for Laplacian reconstruction [27]. Figure 1 shows that this balance is difficult to strike when the point cloud is noisy (rows marked as

N

). Recent learning-based methods learn priors directly from a large number of examples [16, 34, 15, 26, 29]. Because of their ability to learn priors directly from the data, these approaches can produce impressive results from both point clouds and a single image. However, they learn priors at the object level, which limits their ability to reconstruct objects from classes not seen during training (rows marked as

T

). They also struggle to disentangle the shape priors with the object pose: state-of-the-art learning methods can fail completely on a rotated point cloud (rows marked as

P

) even though they can reconstruct the same point cloud when its pose resembles that of the training set (rows marked as P). Tatarchenko \etalgo further and suggest that many of these methods may actually learn a form of classification and nearest-neighbor retrieval from the dataset, rather than a proper 3D reconstruction [33]. In fact, even for rows P and T in Figure 1, a closer look seems to indicate that AtlasNet [11] and OccNet [26] are reconstructing a different couch and chair from the training set.

We present a learning-based method to extract a 3D mesh from a set of sparse, noisy, unordered points that bridges traditional and learning-based approaches. Our key intuition is to learn geometric priors locally while enforcing their consistency globally. To represent shape priors, we introduce meshlets, small patches of mesh that, loosely speaking, serve as a learned dictionary of local features.

Specifically, we use a variational auto-encoder (VAE) [20] to learn the latent space of meshlets that can be observed in natural shapes. We call these natural meshlets. Learning these local features offers two key advantages. First, it allows to reconstruct objects from classes never seen in training: at some scale, a couch exhibits similar local features to those of a bunny.

Second, it disentangles the global pose of the object and the parametrization learned by the network, which allows our algorithm to be robust to dramatic changes of the object’s pose, see Figure 1.

To fit the meshlets to a point cloud we minimize their distance to the points, while enforcing that they belong to the latent space of natural meshlets. Therefore, the resulting surface will locally satisfy the priors we learned. However, because the meshlets are optimized independently of each other, the mesh extracted from their union will not be watertight.

Therefore, we define an auxiliary, watertight mesh and propose to use it in an alternating optimization that ensures that the meshlets are consistent with each other and with the observed point cloud.

We show with extensive comparisons that this iterative method produces results that outperform the state-of-the-art on challenging scenarios such as noisy points clouds in arbitrary poses, as shown in Figure 1. In summary, our contributions are:

  • We present meshlets, a new way of representing local shape priors in the latent space of a variational autoencoder that is trained on local patches from a dataset of real-world objects.

  • We propose an alternating optimization which fits meshlets to the measured point samples (enforcing local constraints) while maintaining global consistency for the mesh.

  • We demonstrate for the first time, to our knowledge, successful reconstructions of 3D meshes from very sparse, noisy point measurements with a category-agnostic, learning-based method.

2 Related Work

Extracting a mesh from a point cloud is an important problem that has been the focus of much research since the early days of graphics. Traditional methods such as marching cubes [24] or Ball-Pivoting [4] work well for cases where the noise is small as compared to the density of the point cloud.

In general, however, noise does pose issues. One traditional solution, then, is to use the points and the relative normals to compute a signed-distance function whose zero crossing is the desired surface [8, 13, 2, 17, 18]. An alternative is to use hand-crafted priors, such as smoothness of the vertices and normals of the estimated mesh [27]. However, these priors introduce a trade-off between suppressing the noise and preserving sharp features that becomes increasingly brittle for sparser and noisier point clouds, see Figure 1.

Priors can be more effectively learned from data with neural networks. Deep learning methods, for instance, have shown great success in estimating depth maps from images, whether from multiple views [14, 36], stereo [19], or even single-image [9, 10, 39, 21]. Even meshes can be directly extracted from a single image, provided that the class of the object is known [16, 34, 15].

Rather than requiring to manually tinker with the traditional noise/sharpness trade-off, methods that learn priors to extract meshes from point clouds introduce a new one: generally speaking, the lower the quality of the observations (\eg, strong noise or sparsity of the point cloud), the stronger the priors need to be, thus affecting the algorithm’s ability to generalize to different and unseen classes. For instance, methods that learn local priors are class-agnostic but tend to need dense point clouds with low levels of noise [12, 38, 37, 35].

The recent works or Park \etal [29], Groueix \etal [11], and Mesheder \etal [26] produce impressive results even with sparser and potentially noisier data, but fail to generalize to completely new classes of objects. Often they even struggle when the point cloud is in a pose that differs significantly from the training pose, as show in Figure 1, rows

P

. This issue is due, in part, to the fact that these methods lack a mechanism to enforce geometric constraints at inference time. Our method is class-agnostic thanks to its ability to learn and enforce local priors while minimizing the error with respect to the point cloud at inference time.

The idea of learning priors from data and enforcing geometric constraints at inference time was recently explored for depth map [5], point cloud [40] and surface estimation [23, 22, 29]. These approaches use low dimensional representations that allow inference time optimization.

However, approaches that learn priors at the object level [40, 23, 22, 29] and, thus, tend to be category specific. Our meshlet priors directly encode the (local) shape of the surface instead of a viewer centric depth [5]. Meshlets are class agnostic and can be used to learn and enforce priors at different scales.

Key to solving the mesh estimation problem is how to represent it. Different representations for meshes exist that are amenable to use with neural networks, but they also tend to be class specific [31, 3]. One key ingredient of our method is the use of small mesh patches, called meshlets, which simplify the processing of the mesh, among other things. A related approach is the work of Groueix \etalwho also represent the mesh as a collection of large parts, which they call charts [11]. However, their method does not offer a mechanism to enforce global consistency and does not leverage local shape priors.

3 Method

Our goal is to estimate a mesh from a set of unordered, non-oriented points. The task is easy when the point cloud is dense and the noise is low. However, when the quality of the observations degrades, \eg, sparser or noisier points, the choice of priors and heuristics becomes central. Hand-crafted priors, such as smoothness, introduce a trade-off between overly smooth and noisy reconstructions, as shown in Figure 1 (-low/high). On the other hand, neural networks can learn priors directly from data, but they introduce other challenges. First, capturing the distribution of generic objects requires training on a large number of examples, possibly larger than what existing datasets can supply. Moreover, generalization can be an issue: the performance of existing learning-based methods quickly degrades when the test objects differ from the ones used in training, as shown in the rows marked as

T

in Figure 1.

Finally, it is not straightforward to disentangle object-level priors and the pose of the object. Figure 1 shows that OccNet [26] and AtlasNet [11], both recent state-of-the-art works, fail for classes never seen in training

T

, or even when the pose of the object is significantly different from the poses seen in training

P

.

To overcome these issues, we propose to learn priors locally: even if the Stanford Bunny in Figure 1 was never seen in training, its local features are similar to those found in more common objects from the training set. We introduce meshlets, which can be regarded as small patches of a mesh, see Figure 17. Loosely speaking, meshlets act as a dictionary of basic shape features. Meshlets are local and of limited size, and thus offer a simple mechanism to disentangle the (local) priors from the object’s pose. If meshlets are adapted to the point cloud independently of each other, however, they may not result in a watertight surface. Therefore, we explicitly enforce their consistency globally. In the following we describe these two stages, and the overall process to extract a mesh.

\animategraphics

[width=palindrome]10figures/meshlet_interp/000000/m_000039

(a)
(f)
\animategraphics

[width=palindrome]10figures/meshlet_interp/000001/m_000039

(g)
(l)
\animategraphics

[width=palindrome]10figures/meshlet_interp/000002/m_000039

(m) Meshlet A
(Animated)

     

(n) Interpolation Meshlets
(o) Meshlet B
Figure 17: Smoothness of the latent space. We can progressively deform one meshlet onto another by interpolating between the corresponding points in latent space (see Section 3.1). Click on a meshlet in the leftmost column to see an animation of the process.

3.1 Local Shape Priors with Meshlets

In this section we introduce meshlets, and describe how we leverage them to enforce local shape priors. Intuitively, a meshlet is a small patch of mesh deformed to adhere to a region of another, larger mesh, see Figure 17 and 90(a).

To extract meshlet at vertex of mesh , we first compute a local geodesic parametrization [25] that maps the 3D coordinates of the vertices in a neighborhood of to coordinates on , the plane tangent to at . We then re-sample the geodesic distance function at integer coordinates . This gives us the correspondence between a vertex on the meshlet at , and a vertex on the mesh in the neighborhood of .

Because they only require a local parametrization computed with respect to the center vertex , meshlets work well even for objects with large, varying curvature. Because they are local, they can learn shape priors that are independent of the pose and class of the object.

Learning local shape priors with meshlets. We want to learn the distribution of “natural meshlets,” \ie, those meshlets that capture the local features of real-world objects. Inspired by recent methods [40, 5], we use a variational auto-encoder (VAE). By training the VAE to reconstruct a large number of meshlets, we force its bottleneck to learn the latent space of natural meshlets. Differently put, vectors sampled on this manifold and fed into the decoder result in natural meshlets. We extract meshlets from objects from the ShapeNet dataset [6] and we feed their 3D coordinates for training. However, we first roto-translate the meshlets to bring them into a canonical pose. This transformation is necessary to make sure that similar meshlets sampled from different 3D locations and orientations map to similar regions in the VAE’s latent space. More specifically, given a meshlet , we first translate and rotate it so that its center is at the origin, and the normal at is aligned with the z-axis, then we rotate it around the axis so that the local coordinates of the meshlet are aligned with the and axes. We call this the canonical pose. A meshlet, then, is completely defined by , the transformation from global to canonical pose, and , the latent vector corresponding to the meshlet in canonical pose:

(1)

Section 4.2 details the network’s architecture. Since we disentangle pose and shape, smoothly traversing the latent space will smoothly vary the shape of the reconstructed meshlet as shown in Figure 17, where we take the latent vectors and corresponding to meshlets A and B, and we progressively interpolate between them to get vectors ’s. The meshlets reconstructed from the ’s smoothly interpolate between the shape of meshlets A and B. (Please use a media-enabled PDF viewer, such as Adobe Reader, to view the animations in Figure 17.)

Fitting a meshlet to 3D points. Assume now that we are given a set of 3D points roughly corresponding to the size of a meshlet (we will generalize this to a complete point cloud in Section 3.2). Deforming a natural meshlet to fit it is now straightforward: we simply traverse the latent space learned by the VAE to minimize the distance between the meshlet and the points. Specifically, we take , an initialization of the meshlet, and run it through the encoder to find the corresponding latent vector . This is the starting point of our optimization. We then freeze the weights of the VAE, compute the error between the meshlet and the points, and take a gradient descent step through the decoder. This brings us to a new point in latent space, , and the corresponding meshlet . Meshlet is a natural meshlet that is closer to the given 3D points. We iterate until convergence, see Figure 18. We note that, although other approaches have also proposed to optimize the latent vector of a VAE to match some measured samples (\eg[29, 5, 1, 23]), they do so at the object (or scene) level. Because our method learns local surface patches, and therefore reuse surface priors across different object categories, it can better generalize.

Figure 18: Optimization of our meshlets using the learned latent-space as a prior. By backpropagating the error with respect to the measured points and using it to update the meshlet’s latent vector, we are effectively moving along the low-dimensional manifold of real meshlets while fitting the points.

3.2 Overall Optimization

Having explained how our meshlets can be used to learn local priors, and can be fit to a set of 3D points, we can describe the overall algorithm, which is fairly straightforward at its core. We start with , an initial, rough approximation of the complete mesh. This could be a sphere, or any other surface that satisfies our meshlet priors, \ie, meshlets extracted from lie on the manifold learned by the VAE. From we extract overlapping meshlets ’s and find the corresponding ’s and ’s, Figure 20. We select so that each vertex on is covered by at least meshlets. Generally, this results in to meshlets. We also find the distance between and the point cloud, whose gradient we can propagate to the meshlets since we have the correspondences between mesh and meshlets by construction. This allows us to update the meshlets to adapt to the points (Section 3.1).

However, this optimization is performed on each meshlet independently, so it results in small gaps between the meshlets, see Figure 19(a). Therefore, we enforce and maintain global consistency by adding a step in which we deform to match the meshlets and update the meshlets to match . Deforming brings it closer to the point cloud, deforming the meshlets forces them to be globally consistent. Finally we iterate:

  1. Optimize meshlets to fit the point cloud (3.2.1).

  2. Optimize meshlets and mesh to match each other (3.2.2).

At convergence, the auxiliary variable , watertight by construction, is our estimation of the mesh. We now explain the two steps in detail.

(a)  (b)
Figure 19: Alternating optimization used in our algorithm. The first stage updates the meshlets based on the errors between the underlying mesh and the measured point cloud. However, because meshlets are localized representations, optimizing them individually causes inconsistencies across the object. Hence, in the second stage we enforce global consistency across all meshlets to reconstruct an updated version of the mesh which is used in the next iteration of the algorithm.

Enforcing Local Shape Priors

To optimize the meshlets with respect to the point cloud we need to define an error. Unfortunately, the correspondences between the point cloud and the vertices of the meshlets are not readily available. A Chamfer distance, then, is not straightforward to use because without correspondences all the points in the point cloud would contribute to the error of all the meshlets—even if they are on opposite sides of the object. However, we do have the correspondences between the vertices of and the meshlets. Therefore, we compute the Chamfer distance between the point cloud and the mesh instead:

(2)

where is a 3D point in the input point cloud, PC. Equation 2 gives us per-vertex error on the mesh, which we can propagate to the corresponding meshlets. We then update the meshlets to minimize as explained in Section 3.1, and get a new set of natural meshlets .

Figure 20: Encoding of meshlets on a given mesh, . Each meshlet is represented by a latent vector in the low-dimensional manifold as well as its global pose (composed of rotation and translation between the canonical space and global coordinates).

Enforcing Global Consistency

To enforce that meshlets are globally consistent, \ie, that their union is a watertight mesh, we use, once again, . Specifically, we compute the Chamfer distance between the vertices of and the vertices of all the meshlets as:

(3)

First we keep the meshlets fixed and deform to minimize . Then we fix the resulting mesh and adjust the meshlets with the algorithm described in Section 3.1, but this time to minimize . We iterate until Equation 3 is minimized. At this point the meshlets will be consistent with the mesh and, in turn, globally. This process corresponds to the block “global consistency” in Figure 19.

4 Implementation Details

4.1 Optimization

We start by describing a few details that improve the efficiency of the optimization procedure or the quality of the resulting meshes.

Mesh initialization. The auxiliary mesh can be any genus-zero mesh that satisfies the meshlets’ priors (see Section 3.2). The actual choice, however, does have a bearing on the number of iterations required to converge. We initialize our approach with an overly-smoothed Laplacian reconstruction. Empirically, we have observed that the results of our algorithm initialized in this way are effectively indistinguishable from the results obtained by using a sphere as an initialization; convergence, however, does take a fraction of the time. For reference, we show a few examples of in the Supplementary.

Meshlets re-sampling. As the optimization progresses, the shape and the size of the auxiliary mesh may change significantly. On the one hand, this is a desirable behavior: if the mesh can scale to arbitrary sizes, it can properly match the size of the underlying mesh, even when the initialization is far from it. On the other hand, it results in a sparser meshlet coverage and, potentially, no coverage in some areas. Moreover, it could cause meshlets to be overly-stretched. Therefore, every 20 iterations of enforcing local shape priors and global consistency (blue arrow in Figure 19), we re-sample the meshlets on the current mesh.

Re-meshing. Large changes from the initialization may also cause issues to the mesh itself, which may stretch in some regions or become otherwise irregular. One way to prevent this is to use strong smoothness priors when enforcing global consistency, but that would hinder our ability to reconstruct sharp features. At the end of every iteration, we re-mesh using Screened Poisson Reconstruction [18] to encourage smoothness while respecting the priors enforced by our approach, \ie, preserving the sharpness of local features. We provide more details in the Supplementary.

4.2 Meshlet training

To train the meshlets network, we sample meshlets from the ShapeNet dataset [6]. We extract meshlets by randomly selecting objects across several classes. We then apply three different scales to each object and extract 256 meshlets for each scale, so that our meshlet dataset captures both fine and coarse details. Note that we disregard meshlets that are problematic. Specifically, we use the geodesic distance algorithm by Melvær \etal [25] and reject those meshlets for which the geodesic distance calculation results in a large anisotropic stretch, or fails altogether. The network, then, is trained to reconstruct these meshlets using as a loss. In all of our experiments we use meshlets of size . To exploit the latent space of natural meshlets, we use a fully-connected encoder decoder network that takes as input a vector (\ie, a vectorized version of the meshlet). The encoder and the decoder are symmetric with 6 layers each, and the latent code vector is one third of the input dimension.

5 Experiments

Comparisons.
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
(j)
(k)
(l)
(m)
(n)
(o)
(p)
(q)
(r)
(s)
(t)
(u)
(v)
(w)
(x)
(y)
(z)
(aa)
(ab)
(ac)
(ad)
(ae)
(af)
(ag)
(ah)
(ai)
(aj)
(ak)
(al)
(am)
(an)
(ao)
(ap)
(aq)
(ar)
(as)
(at)
(au)
(av)
(aw)
(ax)
(ay)
(az)
(ba)
(bb)
(bc)
(bd) GT+PC
(be) [7]+[28]
(bf) [7]+[18]
(bg) [12]+[28]
(bh) [12]+[18]
(bi) Lap-low
(bj) Lap-high
(bk) DGP
(bl) AtlasNet
(bm) OccNet
(bn) Ours
Figure 87: Qualitative comparison of several reconstruction methods and our approach. On the left is the input, the ground-truth (GT) mesh overlaid with the sparse, noisy point cloud (PC). We show results with normals estimated with Meshlab [7] and PCPNet [12]. We reconstruct the resulting point clouds with both RILMS [28] and Screened Poisson [18]. Laplacian regularizer [27] is shown for two levels of smoothing. We also show three recent deep learning approaches: Deep Geometric Prior [35], AtlasNet [11] and OccNet [26]. All of these methods struggle to cope with noise, classes not seen in training, or both.

In this section we evaluate our method against state-of-the-art approaches. Then, we compare our meshlets to other local shape priors to validate their importance.

We compare our method with several state-of-the-art mesh reconstruction approaches. The first is Screened Poisson [18], a widely used, traditional technique that creates watertight surfaces from oriented point clouds. Because our input is a raw point cloud, then, we need to estimate normals. We use two methods to estimate normals. One is MeshLab’s normal estimation, which fits local planes and uses them to estimate normals [7]. The other is a recently published, learning-based method called PCPNet [12]. PCPNet estimates geometric properties such as normals and curvature from local regions on the point clouds. The second mesh reconstruction approach is the method by Öztireli \etal [28], which also requires oriented points. They propose a variant of marching cubes that preserves sharp features using non-linear regression. In addition, we compare against Laplacian mesh optimization [27]. Leveraging the fact that the norm of the mesh’s Laplacian captures the local mean curvature, this mesh optimization algorithm optimizes the Laplacian at the vertices in a weighted least-square sense. The algorithm has a free parameter that regulates the smoothness of the resulting surface. After a parameter sweep we found that no single parameter would yield the best results over the whole dataset. Therefore we settled for two values, each offering a different compromise between denoising and over-smoothing.

We also compare with AtlasNet [11], OccNet [26], and Deep Geometric Prior [35], all of which are deep learning methods. The first two approaches learn priors at the object level, while the third approach learn general local shape priors.

The data. We test all the methods on 20 objects. To validate that our method generalizes well, we also include four objects that are commonly used by the graphics community (Suzanne, the Stanford Bunny, Armadillo, and the Utah Teapot). We select the rest of the meshes from the test set of ShapeNet dataset [6]. We show all the objects in the Supplementary. However, because the ShapeNet meshes are not always watertight, we pre-process them with a simple algorithm that we describe in the Supplementary. Given the watertight meshes we randomly decimate the number of vertices by different factors, obtaining three different sparsity levels. For each sparsity level we also add an increasingly large amount of Gaussian noise.

We describe the parameters we use, and offer visualization of the different levels of noise in the Supplementary.

Numerical evaluation. For our numerical evaluation we use the symmetric Hausdorff distance, which reports the largest vertex reconstruction error for each mesh, and the Chamfer- distance, which computes the distance between two meshes after assigning correspondences based on closest vertices. Table 1 shows that our method performs consistently better than all the competitors. The gap is most apparent when comparing with deep-learning methods that learn priors at the object level, further suggesting that our strategy to learn local priors is a promising direction. We list the numbers for each object across the different noise settings in the Supplementary.

Chamfer- Hausdorff
Meshlab[7]  + Scr.Pois.[18] 0.0285 / 0.0112 0.339 / 0.102
RILMS [28] 0.0177 / 0.0166 0.149 / 0.148
PCPNet[12]  + Scr.Pois.[18] 0.0122 / 0.0109 0.147 / 0.140
RILMS [28] 0.0181 / 0.0176 0.151 / 0.153
Laplacian [27] Low 0.0104 / 0.0103 0.100 / 0.065
High 0.0096 / 0.0094 0.103 / 0.069
Deep Geometric Prior [35] 0.0128 / 0.0130 0.147 / 0.148
AtlasNet [11] 0.0415 / 0.0377 0.293 / 0.263
OccNet [26] 0.0630 / 0.0627 0.304 / 0.285
Ours 0.0090 / 0.0092 0.054 / 0.047
Table 1: We compare our method with state-of-the-art approaches, both traditional and learning-based, using two metrics. For each metric we report mean/median values over all of the objects reconstructed, and across multiple levels of noise. Green and Red indicate the best and second best method respectively.
Qualitative evaluation.
(a)
(b)
Figure 90: Our final meshlets are globally consistent and capture the local shape of the mesh (a). The resulting mesh is regular over the whole reconstructed object (b).

In Figure 90 we show both the meshlets at the end of our optimization (a) and the quality of the final mesh reconstructed by our algorithm (b). Note that, thanks to the re-meshing steps during our optimization procedure (Section 4.1), our output is a high-quality, regular mesh. We also show a subset of the objects used in the numerical evaluation in Figures 1 and 87 for different levels of sparsity and noise. Additional results are in the Supplementary. Competing methods are significantly impacted by noise and produce overly smooth results to attenuate its effect. For example, the Laplacian reconstructions obtained with low regularization (Lap-low) are still noisy, while those for which we used high regularization (Lap-high) are over-smoothed. Even Screened Poisson reconstruction [18], the de facto standard among traditional methods, used in conjunction with PCPNet [12] to estimate the normals, produces visibly noisy results. Finally, as also shown in Figure 1, state-of-the-art deep learning methods only work on objects seen in training and for low levels of noise. Our results, on the other hand, offer the best trade-off between detail and noise by recovering locally sharp features and small details, despite the sparsity and noise of the point clouds.

On the importance of natural meshlets.
(a)
(b)
(c)
(d)
(a) GT+PC
(b) Laplacian
(c) DGP
(d) Meshlets
Figure 99: Meshlets reconstruct local features more accurately than other priors. Input point clouds shown on the GT mesh in (a).

Our meshlet priors are the core of our reconstruction. Here we compare our natural meshlets with other shape priors to isolate their contribution to the overall quality of the result. The first is a Laplacian regularizer, which is a standard smoothness prior [27]. The second, is the recent work by Williams \etal, which suggests that a neural network is also, in itself, a prior for local geometry [35]. We use these priors and our natural meshlet prior to optimize small patches of mesh to small point clouds extracted from real objects.

Figure 99 shows two representative examples. Despite the complexity of the local shape, and the level of noise, the optimization that uses our strategy (Section 3.1) is able to correctly estimate the underlying meshlet. On the contrary, Deep Geometric Prior over-fits to the noise, and the Laplacian regularizer over-smooths the surface. On meshlets, the average symmetric Hausdorff distance is for DGP, for Laplacian, for our method.

6 Discussion and Limitations

Our approach optimizes a mesh and a number of meshlets based on the gradients available at mesh vertexes, while enforcing the meshlets priors. In this paper, the gradients for the mesh were obtained by computing the distance of the mesh to the point cloud. However, our method can take gradients from any source, including a differentiable renderer [16]. This adds to the flexibility of our approach

In our work we learn and enforce priors for mesh estimation using meshlets, which have an intrinsic scale and resolution. Our current approach uses a single fixed scale of the meshlet for all the object reconstructions, although we effectively learn meshlets at multiple scales (see Section 4.2). This poses limitations on the level of details we can reconstruct: they cannot be smaller than the resolution of the meshlet. Using a meshlet at a single scale throughout the mesh deformation process may also lead to local minima. A natural extension, then, would be to use a coarse-to-fine approach.

Our current approach is computationally expensive and not optimized for speed. Hence, it can take from hours to dozens of hours, depending on the initialization to run the full optimization. Several acceleration techniques for finding correspondences between mesh and meshlets exist that would help. Improving the efficiency of the meshlets extraction would help.

7 Conclusions

We have presented a novel geometrical representation, meshlets, which allows us to robustly reconstruct 3D meshes from sparse, noisy point samples. By training a variational autoencoder to learn the low-dimensional manifold of natural local surface patches, meshlets provide us with a strong prior that can be used to properly reconstruct geometry from sparse samples. However, because meshlets are localized representations, optimizing them independently would result in an inconsistent surface. Therefore, we propose an alternating optimization which first optimizes the meshlets to match the point samples and then enforces consistency across all of them to match the global shape of the reconstructed mesh. The resulting algorithm is able to reconstruct surfaces from very sparse and noisy samples more reliably than state-of-the-art approaches.

Acknowledgments

We thank Kihwan Kim, Alejandro Troccoli, and Ben Eckart for helpful discussions about evaluation. We also thank Arash Vahdat for valuable discussions on VAE training. Abhishek was supported by the NVIDIA Fellowship. This work was partially funded by National Science Foundation grant #IIS-1619376.

1 Additional Experimentation Details

In this section we give additional details about the experiments shown in the main paper.

1.1 Generating Water-tight Meshes

We used meshlets extracted from the ShapeNet [6] dataset for training. The test dataset for mesh reconstruction was formed by selecting objects from the test set of ShapeNet dataset as well as few objects from outside the ShapeNet. However, most ShapeNet objects are not watertight. To generate water-tight meshes for our objects we used the process described by Stutz and Geiger [32]. First, the object is scaled to lie in . Next, depth maps are rendered from 200 views and with a resolution of . These depth maps are used to perform TSDF fusion. We use volume for TSDF fusion. Finally a mesh simplification step is performed using meshlab to give us a final mesh with 50k vertices. These vertices are roughly uniform over the surface of the mesh.

1.2 Noise, Sparsity and Outliers Parameters used in Experiments

To test different approaches we designed three different settings for noise and sparsity.

Given a GT object with 50k vertices, we first randomly sample a of the vertices to get a sparse point cloud. Following this we add a Gaussian noise of magnitude to each point in the sparse point cloud.

The three different settings used for the experiments are as follows:

  • Setting 1 (S1): and

  • Setting 2 (S2): and

  • Setting 3 (S3): and

To appreciate the level of noise we provide a visualization of these noise settings in Figure 40.

1.3 Test dataset and Initialization

In Figure 21 we show all the GT objects used for the mesh reconstruction evaluation. We also show the initial mesh, . Note that we used the exact same to initialize both our method and the Laplacian mesh optimization.

(a) bench_1
(b) camera_1
(c) chair_1
(d) clock_1
(e) monitor_1
(f) guitar_1
(g) lamp_1
(h) speaker_1
(i) mailbox_1
(j) sofa_1
(k) sofa_2
(l) sofa_3
(m) sofa_4
(n) table_1
(o) table_2
(p) mobile_1
(q) armadillo
(r) bunny
(s) suzzane
(t) teapot
Figure 21: In this figure we show the initialization mesh and GT pair for all of the test objects. We use to initialize our method, as well as the Laplacian mesh optimization.
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
(j) S1
(k) S2
(l) S3
(m) S1
(n) S2
(o) S3
(p) S1
(q) S2
(r) S3
Figure 40: In this figure we show three different noise and sparsity settings used in our experiments. Setting S1 has less noise but a more sparse point cloud. Setting S3 has denser but a more noisier point cloud. These settings are designed to test the robustness of our approach to sparse and noisy observations.

2 Additional Results

Figure 162 shows additional qualitative comparisons between different approaches.

(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
(j)
(k)
(l)
(m)
(n)
(o)
(p)
(q)
(r)
(s)
(t)
(u)
(v)
(w)
(x)
(y)
(z)
(aa)
(ab)
(ac)
(ad)
(ae)
(af)
(ag)
(ah)
(ai)
(aj)
(ak)
(al)
(am)
(an)
(ao)
(ap)
(aq)
(ar)
(as)
(at)
(au)
(av)
(aw)
(ax)
(ay)
(az)
(ba)
(bb)
(bc)
(bd)
(be)
(bf)
(bg)
(bh)
(bi)
(bj)
(bk)
(bl)
(bm)
(bn)
(bo)
(bp)
(bq)
(br)
(bs)
(bt)
(bu)
(bv)
(bw)
(bx)
(by)
(bz)
(ca)
(cb)
(cc)
(cd)
(ce)
(cf)
(cg)
(ch)
(ci)
(cj)
(ck)
(cl)
(cm)
(cn)
(co)
(cp)
(cq)
(cr)
(cs)
(ct)
(cu)
(cv)
(cw)
(cx)
(cy)
(cz)
(da)
(db)
(dc)
(dd)
(de)
(df)
(dg) GT+PC
(dh) [7]+[28]
(di) [7]+[18]
(dj) [12]+[28]
(dk) [12]+[18]
(dl) Lap-low
(dm) Lap-high
(dn) DGP
(do) AtlasNet
(dp) OccNet
(dq) Ours
Figure 162: Qualitative comparison of several reconstruction methods and our approach. On the left is the input, the ground-truth (GT) mesh overlaid with the sparse, noisy point cloud (PC). We show results with normals estimated with Meshlab [7] and PCPNet [12]. We reconstruct the resulting point clouds with both RILMS [28] and Screened Poisson [18]. Laplacian regularizer [27] is shown for two levels of smoothing. We also show three recent deep learning approaches: Deep Geometric Prior [35], AtlasNet [11] and OccNet [26].

We provide the numbers for each object across the different noise settings in Table 1 (Symmetric Hausdorff) and Table 2 (Chamfer-).

obj_name Ours DGP Lap-low Lap-high [7]+[18] [7]+[28] [12]+[18] [12]+[28] OccNet AtlasNet
clock_1_S3 3.653 17.457 11.052 11.005 122.523 15.532 11.359 15.511 29.283 24.631
clock_1_S2 3.536 13.955 10.953 10.891 6.235 13.111 11.333 11.858 25.27 21.75
clock_1_S1 2.853 13.349 11.419 11.356 3.971 10.19 11.778 15.908 25.274 21.229
chair_1_S3 5.434 16.978 32.646 33.899 115.056 17.746 14.692 16.042 22.796 22.476
chair_1_S2 3.154 12.676 30.231 32.58 112.8 13.675 14.424 13.751 12.777 19.855
chair_1_S1 4.204 10.124 29.066 32.861 4.012 12.236 14.455 12.642 12.651 17.271
sofa_4_S3 5.327 17.874 5.597 6.561 39.052 18.864 24.706 18.801 65.388 35.434
sofa_4_S2 6.13 15.129 4.933 6.111 16.215 14.744 26.939 16.667 33.341 33.299
sofa_4_S1 12.935 13.947 9.957 11.439 11.61 15.108 31.597 19.499 27.108 32.914
sofa_2_S3 5.552 18.691 8.772 12.991 166.4 17.276 20.451 15.399 26.107 54.051
sofa_2_S2 6.69 14.925 5.316 7.299 31.524 14.784 20.95 14.418 22.708 49.475
sofa_2_S1 3.857 10.758 5.074 6.611 13.18 13.112 24.549 14.801 16.465 40.677
sofa_3_S3 4.272 17.02 6.688 6.387 12.701 18.736 19.288 15.456 23.524 21.608
sofa_3_S2 4.567 16.181 6.169 5.986 6.57 15.084 20.977 14.355 23.436 22.014
sofa_3_S1 3.413 11.27 4.766 4.185 4.412 13.232 23.584 14.913 16.549 14.023
sofa_1_S3 4.513 16.788 4.485 4.01 10.122 17.564 27.773 16.158 22.573 23.399
sofa_1_S2 5.796 13.385 3.687 4.92 4.658 13.339 27.74 14.893 22.384 22.026
sofa_1_S1 5.474 10.798 5.449 5.23 5.2 15.536 30.814 18.724 7.518 20.873
bench_1_S3 5.142 16.187 5.887 4.674 153.809 16.129 17.476 15.43 30.692 23.743
bench_1_S2 4.551 14.132 3.899 3.906 112.483 13.554 18.607 12.978 26.192 17.842
bench_1_S1 3.543 10.362 4.36 4.379 4.493 10.929 20.693 12.559 20.622 15.442
guitar_1_S3 7.381 14.657 10.117 10.177 163.681 16.889 7.67 15.33 20.392 25.868
guitar_1_S2 6.18 11.316 7.614 7.65 137.846 12.287 6.419 10.392 19.465 25.338
guitar_1_S1 3.011 11.2 7.332 7.415 9.681 9.027 5.831 8.532 18.356 26.169
suzanne_S3 5.719 16.841 6.748 7.252 25.533 16.781 6.49 15.598 34.363 34.366
suzanne_S2 4.141 14.976 6.327 6.516 5.124 15.299 7.623 14.374 32.841 27.383
suzanne_S1 4.797 10.417 5.473 5.442 4.721 9.305 7.051 14.159 30.643 23.145
teapot_S3 6.152 17.25 11.014 11.161 118.128 18.79 11.768 15.414 26.124 25.139
teapot_S2 7.314 15.836 10.832 11.473 17.74 15.421 11.692 13.771 23.316 22.191
teapot_S1 7.58 11.849 10.854 11.121 6.713 10.7 11.139 13.41 24.015 18.796
armadillo_S3 5.812 19.349 23.069 23.48 117.21 18.891 10.438 17.449 41.609 37.975
armadillo_S2 4.247 16.592 23.446 23.703 8.058 14.834 12.623 12.515 40.507 36.749
armadillo_S1 4.342 12.021 22.962 24.023 6.943 16 14.953 14.678 39.251 35.446
bunny_S3 3.917 25.159 19.27 19.099 25.02 20.017 20.489 17.757 75.799 50.199
bunny_S2 6.974 18.543 19.11 19.198 8.336 14.737 21.744 18.383 66.464 47.761
bunny_S1 3.967 13.767 19.421 19.429 5.853 13.703 23.577 14.289 60.246 52.51
mobile_1_S3 4.189 16.22 4.63 3.216 82.184 18.032 5.012 15.388 29.668 18.336
mobile_1_S2 3.579 13.233 3.94 3.212 4.531 14.088 3.509 12.554 28.743 16.21
mobile_1_S1 2.869 9.153 2.979 2.484 3.201 10.468 5.165 10.361 28.298 16.223
speaker_1_S3 5.033 18.527 8.624 8.327 17.667 19.866 14.627 16.993 34.595 48.009
speaker_1_S2 5.141 14.938 7.281 7.328 21.282 15.363 14.223 14.691 17.407 47.535
speaker_1_S1 4.737 14.208 5.145 4.72 5.276 13.521 12.756 12.722 9.65 47.674
mailbox_1_S3 10.031 16.761 6.169 5.413 38.719 20.275 5.141 15.046 35.726 27.246
mailbox_1_S2 10.368 12.95 4.312 4.874 9.469 12.663 4.967 12.598 30.253 26.995
mailbox_1_S1 9.868 11.426 4.425 6.325 4.245 8.704 6.586 8.777 26.279 27.183
camera_1_S3 25.875 19.303 5.048 5.05 29.712 16.55 5.022 16.955 33.868 39.07
camera_1_S2 5.622 12.472 5.091 5.331 6.308 14.748 9.199 15.082 35.641 35.475
camera_1_S1 4.787 7.959 4.943 4.081 4.349 13.25 14.537 17.113 26.015 30.47
table_1_S3 2.944 21.648 4.165 2.996 30.838 18.05 14.61 16.602 46.057 26.448
table_1_S2 3.182 16.677 3.886 3.124 6.728 15.498 13.449 16.89 42.707 17.832
table_1_S1 3.185 8.448 3.563 3.369 3.934 12.077 13.865 16.576 40.015 16.728
table_2_S3 4.791 18.325 21.119 21.256 47.586 20.083 24.451 20.052 39.425 40.386
table_2_S2 4.568 15.504 21.173 21.174 28.283 13.954 21.651 15.999 38.058 43.241
table_2_S1 5.523 9.842 20.715 21.051 12.037 13.089 24.821 18.586 37.659 38.992
monitor_1_S3 4.985 19.937 4.791 5.151 27.943 18.517 8.319 16.546 32.03 26.32
monitor_1_S2 5.325 16.997 5.748 5.66 5.546 16.31 7.464 15.472 14.612 21.924
monitor_1_S1 5.036 8.414 5.448 5.518 5.156 13.91 5.522 15.862 9.04 21.262
lamp_1_S3 3.934 18.594 11.321 11.347 10.307 18.322 9.497 15.259 43.916 29.247
lamp_1_S2 3.112 13.952 10.25 10.115 4.465 13.1 11.141 14.075 42.604 27.942
lamp_1_S1 4.107 12.046 9.68 9.792 4.88 11.655 15.259 17.152 41.499 27.995
mean 5.482 14.655 9.974 10.255 33.871 14.921 14.741 15.069 30.497 29.363
median 4.789 14.791 6.507 6.931 10.2145 14.809 14.044 15.359 28.520 26.384
Table 1: Hausdorff distances between GT and estimated mesh. The distances are multiplied by a factor of 100 for better readability. We highlight in bold the best performing approach as well as other approaches that are within 0.2 error of the best approach. We show results with normals estimated with Meshlab [7] and PCPNet [12]. We reconstruct the resulting point clouds with both RILMS [28] and Screened Poisson [18]. Laplacian regularizer [27] is shown for two levels of smoothing. We also show three recent deep learning approaches: Deep Geometric Prior [35], AtlasNet [11] and OccNet [26].
obj_name Ours DGP Lap-low Lap-high [7]+[18] [7]+[28] [12]+[18] [12]+[28] OccNet AtlasNet
clock_1_S3 0.792 1.658 1.199 1.071 9.294 2.562 1.265 2.378 8.748 4.26
clock_1_S2 0.759 1.266 1.019 0.962 1.008 1.781 1.252 1.6 5.506 3.823
clock_1_S1 0.735 0.908 0.931 0.943 0.827 0.995 1.301 1.374 3.951 3.576
chair_1_S3 0.873 1.589 1.431 1.448 7.374 2.635 1.432 2.644 7.326 3.052
chair_1_S2 0.761 1.204 1.21 1.392 9.841 1.871 1.247 1.821 5.169 2.358
chair_1_S1 0.747 0.904 0.989 1.157 0.846 1.104 1.271 1.524 3.008 1.897
sofa_4_S3 1.14 1.669 1.188 1.166 2.118 2.633 1.54 2.253 9.094 7.69
sofa_4_S2 1.197 1.32 1.13 1.163 1.226 1.567 1.757 1.883 6.557 7.39
sofa_4_S1 1.361 1.02 1.127 1.171 1.112 1.148 2.591 2.007 3.453 7.178
sofa_2_S3 0.985 1.631 1.24 1.26 6.973 2.524 1.359 2.398 7.678 7.688
sofa_2_S2 0.967 1.228 1.044 1.082 2.205 1.701 1.379 1.699 5.169 6.886
sofa_2_S1 0.952 0.956 0.997 1.043 1.257 1.213 1.785 1.727 3.256 6.304
sofa_3_S3 0.881 1.714 1.079 0.823 1.491 2.567 1.381 2.409 6.947 3.619
sofa_3_S2 0.832 1.272 0.922 0.792 0.939 1.586 1.333 1.693 5.558 2.655
sofa_3_S1 0.786 0.936 0.864 0.759 0.776 0.957 1.583 1.706 2.555 2.029
sofa_1_S3 0.951 1.642 1.08 0.95 1.369 2.551 1.672 2.356 7.21 3.583
sofa_1_S2 0.976 1.301 0.988 0.901 0.976 1.513 1.714 1.812 5.508 2.775
sofa_1_S1 0.937 0.991 0.925 0.869 0.891 1.031 2.07 1.938 2.602 2.384
bench_1_S3 0.938 1.547 1.199 0.953 11.713 2.485 1.476 2.502 7.794 3.876
bench_1_S2 0.793 1.202 0.916 0.809 9.116 1.827 1.514 1.788 5.702 2.697
bench_1_S1 0.748 0.947 0.834 0.794 0.814 1.087 1.867 1.496 3.524 1.961
guitar_1_S3 0.973 1.487 1.77 1.455 18.503 2.51 1.229 3.014 9.138 4.396
guitar_1_S2 0.657 1.161 1.046 0.837 12.261 2.004 0.8 1.945 6.986 3.166
guitar_1_S1 0.475 0.859 0.712 0.597 0.869 1.26 0.642 1.011 4.413 2.38
suzanne_S3 1.013 1.684 1.123 0.93 2.467 2.551 0.955 2.217 7.967 4.529
suzanne_S2 0.906 1.247 0.95 0.894 0.937 1.597 0.873 1.399 6.573 4.162
suzanne_S1 0.954 0.933 0.899 0.873 0.756 0.916 0.916 1.21 5.825 3.991
teapot_S3 0.827 1.72 1.184 0.841 9.856 2.694 1.008 2.318 6.316 3.777
teapot_S2 0.739 1.238 0.882 0.744 1.118 1.832 0.869 1.465 5.265 3.336
teapot_S1 0.736 0.912 0.818 0.73 0.711 0.977 0.876 1.206 4.66 2.902
armadillo_S3 1.033 1.653 1.301 1.247 7.371 2.778 1.062 2.456 9.835 6.23
armadillo_S2 0.908 1.208 1.071 1.132 0.999 1.735 0.972 1.604 8.848 5.931
armadillo_S1 0.962 0.914 1.007 1.1 0.809 1.046 1.082 1.44 7.576 5.639
bunny_S3 1.037 1.716 1.133 1.087 1.852 2.569 1.312 2.261 13.601 7.731
bunny_S2 1.02 1.277 1.033 1.05 0.962 1.453 1.283 1.529 12.126 7.482
bunny_S1 1.044 0.976 0.989 1.048 0.839 0.965 1.434 1.392 11.111 7.348
mobile_1_S3 0.778 1.72 1.123 0.795 6.66 2.835 0.94 2.332 7.237 4.035
mobile_1_S2 0.686 1.264 0.91 0.71 0.938 1.731 0.772 1.313 5.552 2.621
mobile_1_S1 0.672 0.897 0.781 0.656 0.738 0.925 0.717 0.903 4.136 1.832
speaker_1_S3 0.966 1.659 1.126 0.945 1.686 2.706 1.08 2.256 8.276 3.621
speaker_1_S2 0.92 1.244 0.987 0.887 1.381 1.692 1.024 1.515 4.767 3.328
speaker_1_S1 0.925 0.932 0.919 0.868 0.873 1.077 0.986 1.259 3.059 3.476
mailbox_1_S3 0.976 1.654 1.437 1.003 3.53 2.505 1.029 2.615 7.939 4.331
mailbox_1_S2 0.732 1.201 0.944 0.733 1.122 1.908 0.78 1.665 6.871 3.403
mailbox_1_S1 0.68 0.805 0.732 0.648 0.702 0.978 0.695 0.935 5.187 2.928
camera_1_S3 1.021 1.646 1.072 0.955 2.528 2.495 0.984 2.057 6.927 4.052
camera_1_S2 1.075 1.252 1.027 0.953 1.008 1.62 0.979 1.544 6.225 3.554
camera_1_S1 0.999 0.954 0.954 0.922 0.93 1.058 1.29 1.786 4.895 3.425
table_1_S3 0.825 1.697 1.032 0.792 1.758 2.53 1.092 2.191 9.56 3.899
table_1_S2 0.788 1.266 0.908 0.755 0.912 1.506 1.009 1.448 6.487 3.719
table_1_S1 0.803 0.949 0.865 0.745 0.767 0.893 1.018 1.286 3.635 3.757
table_2_S3 0.934 1.66 1.202 1.124 3.284 2.735 1.539 2.482 8.435 5.142
table_2_S2 0.925 1.279 1.1 1.08 1.665 1.622 1.474 1.91 6.948 4.359
table_2_S1 0.926 0.952 1.023 1.036 1.058 1.05 1.832 1.959 4.537 3.88
monitor_1_S3 1.094 1.66 1.109 1.072 2.048 2.605 1.03 2.033 6.693 4.368
monitor_1_S2 1.179 1.273 1.061 1.067 1.018 1.537 0.977 1.386 4.236 4.261
monitor_1_S1 1.13 0.96 1.005 1.046 0.957 1.017 1.033 1.338 2.354 4.017
lamp_1_S3 0.8 1.742 1.06 0.86 1.361 2.544 1.019 2.089 8.63 3.667
lamp_1_S2 0.766 1.283 0.933 0.823 0.879 1.376 0.934 1.415 5.624 3.359
lamp_1_S1 0.772 0.969 0.88 0.8 0.728 0.891 1.004 1.441 3.028 2.998
mean 0.896 1.280 1.040 0.956 2.850 1.768 1.222 1.811 6.297 4.145
median 0.922 1.258 1.025 0.944 1.115 1.657 1.087 1.757 6.270 3.767
Table 2: Chamfer- distances between GT and estimated mesh. The distances are multiplied by a factor of 100 for better readability. We highlight in bold the best performing approach as well as other approaches that are within 0.02 error of the best approach. We show results with normals estimated with Meshlab [7] and PCPNet [12]. We reconstruct the resulting point clouds with both RILMS [28] and Screened Poisson [18]. Laplacian regularizer [27] is shown for two levels of smoothing. We also show three recent deep learning approaches: Deep Geometric Prior [35], AtlasNet [11] and OccNet [26].

3 Additional Algorithm Details

The pseudo code for our optimization procedure to estimate a watertight mesh while enforcing meshlets priors is explained in Algorithm LABEL:algo:meshoptim.

In our optimization procedure we update both the meshlets and the auxiliary mesh. While meshlets priors make the updates of meshlets stable, a prior is needed while updating mesh to ensure that the mesh is watertight and vertices are uniformly distributed. Use of smoothness priors or other priors for mesh would hinder our ability to reconstruct sharp features. Hence we use Screened Poisson Reconstruction [18] at the end of every iteration and use the vertices and normals of globally consistent meshlets to update the mesh.

\@dblfloat

algocf[htbp]\end@dblfloat

Footnotes

  1. This work was done while Abhishek Badki was interning at NVIDIA.

References

  1. T. Bagautdinov, C. Wu, J. Saragih, P. Fua and Y. Sheikh (2018) Modeling facial geometry using compositional VAEs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §3.1.
  2. C. L. Bajaj, F. Bernardini and G. Xu (1995) Automatic reconstruction of surfaces and scalar fields from 3D scans. In Proceedings of SIGGRAPH, Cited by: §2.
  3. H. Ben-Hamu, H. Maron, I. Kezurer, G. Avineri and Y. Lipman (2018) Multi-chart generative surface modeling. ACM Transactions on Graphics. Cited by: §2.
  4. F. Bernardini, J. Mittleman, H. Rushmeier, C. Silva and G. Taubin (1999) The ball-pivoting algorithm for surface reconstruction. IEEE Transactions on Visualization and Computer Graphics (TVCG). Cited by: §2.
  5. M. Bloesch, J. Czarnowski, R. Clark, S. Leutenegger and A. J. Davison (2018) CodeSLAM — learning a compact, optimisable representation for dense visual SLAM. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §2, §2, §3.1, §3.1.
  6. A. X. Chang, T. Funkhouser, L. Guibas, P. Hanrahan, Q. Huang, Z. Li, S. Savarese, M. Savva, S. Song, H. Su, J. Xiao, L. Yi and F. Yu (2015) ShapeNet: An information-rich 3D model repository. Technical report Technical Report arXiv:1512.03012. Cited by: §1.1, §3.1, §4.2, §5.
  7. P. Cignoni, M. Callieri, M. Corsini, M. Dellepiane, F. Ganovelli and G. Ranzuglia (2008) MeshLab: An open-source mesh processing tool. In Eurographics Italian Chapter Conference, Cited by: (dh)dh, (di)di, Figure 162, Table 1, Table 2, (be)be, (bf)bf, Figure 87, §5, Table 1.
  8. B. Curless and M. Levoy (1996) A volumetric method for building complex models from range images. In ACM Transactions on Graphics (SIGGRAPH), Cited by: §2.
  9. D. Eigen, C. Puhrsch and R. Fergus (2014) Depth map prediction from a single image using a multi-scale deep network. In Advances in Neural Information Processing Systems (NIPS), Cited by: §2.
  10. C. Godard, O. Mac Aodha and G. J. Brostow (2017) Unsupervised monocular depth estimation with left-right consistency. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §2.
  11. T. Groueix, M. Fisher, V. G. Kim, B. C. Russell and M. Aubry (2018) A papier-mâché approach to learning 3D surface generation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: Figure 1, §1, Figure 162, Table 1, Table 2, §2, §2, §3, Figure 87, §5, Table 1.
  12. P. Guerrero, Y. Kleiman, M. Ovsjanikov and N. J. Mitra (2018) PCPNet: learning local shape properties from raw point clouds. In Computer Graphics Forum, Cited by: (dj)dj, (dk)dk, Figure 162, Table 1, Table 2, §2, (bg)bg, (bh)bh, Figure 87, §5, §5, Table 1.
  13. H. Hoppe, T. DeRose, T. Duchamp, J. McDonald and W. Stuetzle (1992) Surface reconstruction from unorganized points. In Proceedings of SIGGRAPH, Cited by: §2.
  14. P. Huang, K. Matzen, J. Kopf, N. Ahuja and J. Huang (2018) DeepMVS: Learning multi-view stereopsis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §1, §2.
  15. A. Kanazawa, S. Tulsiani, A. A. Efros and J. Malik (2018) Learning category-specific mesh reconstruction from image collections. In Proceedings of the European Conference on Computer Vision (ECCV), Cited by: §1, §2.
  16. H. Kato, Y. Ushiku and T. Harada (2018) Neural 3D mesh renderer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §1, §2, §6.
  17. M. Kazhdan, M. Bolitho and H. Hoppe (2006) Poisson surface reconstruction. In Eurographics Symposium on Geometry Processing, Cited by: §2.
  18. M. Kazhdan and H. Hoppe (2013) Screened Poisson surface reconstruction. ACM Transactions on Graphics. Cited by: (di)di, (dk)dk, Figure 162, Table 1, Table 2, §2, §3, §4.1, (bf)bf, (bh)bh, Figure 87, §5, §5, Table 1.
  19. A. Kendall, H. Martirosyan, S. Dasgupta, P. Henry, R. Kennedy, A. Bachrach and A. Bry (2017) End-to-end learning of geometry and context for deep stereo regression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §2.
  20. D. P. Kingma and M. Welling (2013) Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114. Cited by: §1.
  21. K. Lasinger, R. Ranftl, K. Schindler and V. Koltun (2019) Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. arXiv preprint arXiv:1907.01341. Cited by: §2.
  22. Cited by: §2, §2.
  23. O. Litany, A. Bronstein, M. Bronstein and A. Makadia (2018) Deformable shape completion with graph convolutional autoencoders. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §2, §2, §3.1.
  24. W. E. Lorensen and H. E. Cline (1987) Marching cubes: A high resolution 3D surface construction algorithm. In Proceedings of SIGGRAPH, Cited by: §2.
  25. E. L. Melvær and M. Reimers (2012) Geodesic polar coordinates on polygonal meshes. In Computer Graphics Forum, Cited by: §3.1, §4.2.
  26. L. Mescheder, M. Oechsle, M. Niemeyer, S. Nowozin and A. Geiger (2019) Occupancy networks: Learning 3D reconstruction in function space. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: Figure 1, §1, Figure 162, Table 1, Table 2, §2, §3, Figure 87, §5, Table 1.
  27. A. Nealen, T. Igarashi, O. Sorkine and M. Alexa (2006) Laplacian mesh optimization. In ACM International Conference on Computer Graphics and Interactive Techniques (GRAPHITE), Cited by: Figure 1, §1, Figure 162, Table 1, Table 2, §2, Figure 87, §5, §5, Table 1.
  28. A. C. Öztireli, G. Guennebaud and M. Gross (2009) Feature preserving point set surfaces based on non-linear kernel regression. In Computer Graphics Forum, Cited by: (dh)dh, (dj)dj, Figure 162, Table 1, Table 2, (be)be, (bg)bg, Figure 87, §5, Table 1.
  29. J. J. Park, P. Florence, J. Straub, R. Newcombe and S. Lovegrove (2019-06) DeepSDF: Learning continuous signed distance functions for shape representation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §1, §2, §2, §2, §3.1.
  30. D. Scharstein and R. Szeliski (2003) High-accuracy stereo depth maps using structured light. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §1.
  31. A. Sinha, A. Unmesh, Q. Huang and K. Ramani (2017) SurfNet: Generating 3D shape surfaces using deep residual networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §2.
  32. D. Stutz and A. Geiger (2018) Learning 3d shape completion under weak supervision. CoRR. Cited by: §1.1.
  33. M. Tatarchenko, S. R. Richter, R. Ranftl, Z. Li, V. Koltun and T. Brox (2019) What do single-view 3D reconstruction networks learn?. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §1.
  34. N. Wang, Y. Zhang, Z. Li, Y. Fu, W. Liu and Y. Jiang (2018) Pixel2mesh: Generating 3D mesh models from single RGB images. In Proceedings of the European Conference on Computer Vision (ECCV), Cited by: §1, §2.
  35. F. Williams, T. Schneider, C. Silva, D. Zorin, J. Bruna and D. Panozzo (2019-06) Deep geometric prior for surface reconstruction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: Figure 162, Table 1, Table 2, §2, Figure 87, §5, §5, Table 1.
  36. Y. Yao, Z. Luo, S. Li, T. Fang and L. Quan (2018) MVSNet: Depth inference for unstructured multi-view stereo. In Proceedings of the European Conference on Computer Vision (ECCV), Cited by: §1, §2.
  37. L. Yu, X. Li, C. Fu, D. Cohen-Or and P. Heng (2018) EC-Net: An edge-aware point set consolidation network. In Proceedings of the European Conference on Computer Vision (ECCV), Cited by: §2.
  38. L. Yu, X. Li, C. Fu, D. Cohen-Or and P. Heng (2018) PU-Net: point cloud upsampling network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §2.
  39. T. Zhou, M. Brown, N. Snavely and D. G. Lowe (2017) Unsupervised learning of depth and ego-motion from video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §2.
  40. R. Zhu, C. Wang, C. Lin, Z. Wang and S. Lucey (2018) Object-centric photometric bundle adjustment with deep shape prior. In IEEE Winter Conference on Applications of Computer Vision (WACV), Cited by: §2, §2, §3.1.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
404081
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description