MeshWalker: Deep Mesh Understanding by Random Walks

MeshWalker: Deep Mesh Understanding by Random Walks

Abstract.

Most attempts to represent 3D shapes for deep learning have focused on volumetric grids, multi-view images and point clouds. In this paper we look at the most popular representation of 3D shapes in computer graphics—a triangular mesh—and ask how it can be utilized within deep learning. The few attempts to answer this question propose to adapt convolutions & pooling to suit Convolutional Neural Networks (CNNs). This paper proposes a very different approach, termed MeshWalker, to learn the shape directly from a given mesh. The key idea is to represent the mesh by random walks along the surface, which ”explore” the mesh’s geometry and topology. Each walk is organized as a list of vertices, which in some manner imposes regularity on the mesh. The walk is fed into a Recurrent Neural Network (RNN) that ”remembers” the history of the walk. We show that our approach achieves state-of-the-art results for two fundamental shape analysis tasks: shape classification and semantic segmentation. Furthermore, even a very small number of examples suffices for learning. This is highly important, since large datasets of meshes are difficult to acquire.

Shape Analysis, Deep Learning, Mesh segmentation, Mesh Classification, Random Walks
1

Figure 1. Classification by MeshWalker. This figure shows classification results as the walk (in green) proceeds along the surface of a camel ( faces) from SHREC11 (Lian et al., 2011). The initial point was randomly chosen on the neck. After steps (left), being the number of vertices, the system is uncertain regarding the class, and the highest probability predictions are for the flamingo class and for the hand class (out of classes). After continuing the random walk along the body and the front leg for steps, the probability of being a horse is higher than before, but the camel already has quite a high probability. Finally, after steps (right) and walking also along the hump, the system correctly classifies the model as a camel.

1. Introduction

The most-commonly used representation of surfaces in computer graphics is a polygonal mesh, due to its numerous benefits, including efficiency and high-quality. Nevertheless, in the era of deep learning, this representation is often bypassed because of its irregularity, which does not suit Convolutional Neural Networks (CNNs). Instead, 3D data is often represented as volumetric grids (Maturana and Scherer, 2015; Sedaghat et al., 2016b; Roynard et al., 2018; Ben-Shabat et al., 2018) or multiple 2D projections (Su et al., 2015; Boulch et al., 2017; Feng et al., 2018; Yavartanoo et al., 2018; Kanezaki et al., 2018). In some recent works point clouds are utilized and new ways to convolve or pool are proposed (Atzmon et al., 2018; Xu et al., 2018; Li et al., 2018; Hua et al., 2018; Thomas et al., 2019).

Despite the benefits of these representations, they miss the notions of neighborhoods and connectivity and might not be as good for capturing local surface properties. Recently, several works have proposed to maintain the potential of the mesh representation, while still utilizing neural networks. FeaStNet (Verma et al., 2018) proposes a graph neural network in which the neighborhood of each vertex for the convolution operation is calculated dynamically based on its features. MeshCNN (Hanocka et al., 2019) defines pooling and convolution layers over the mesh edges. MeshNet (Feng et al., 2019) treats the faces of a mesh as the basic unit and extracts their spatial and structural features individually to offer the final semantic representation. LRF-Conv (Yang et al., 2020) learns descriptors directly from the raw mesh by defining new continuous convolution kernels that provide robustness to sampling. All these methods redefine the convolution operation, and by doing so, are able to fit the unordered structure of a mesh to a CNN framework.

We propose a novel and fundamentally different approach, named MeshWalker. As in previous approaches that learn directly from the mesh data, the basic question is how to impose regularity on the unordered data. Our key idea is to represent the mesh by random walks on its surface. These walks explore the local geometry of the surface, as well as its global one. Every walk is fed into a Recurrent Neural Network (RNN), that ”remembers” the walk’s history.

In addition to simplicity, our approach has three important benefits. First, we will show that even a small dataset suffices for training. Intuitively, we can generate multiple random walks for a single model; these walks provide multiple explorations of the model. This may be considered as equivalent to using different projections of 3D objects in the case of image datasets. Second, as opposed to CNNs, RNNs are inherently robust to sequence length. This is vital in the case of meshes, as datasets include objects of various granularities. Third, the meshes need not be watertight or have a single connected component; our approach can handle any triangular mesh.

Our approach is general and can be utilized to address a variety of shape analysis tasks. We demonstrate its benefit in two basic applications: mesh classification and mesh semantic segmentation. Our results are superior to those of state-of-the-art approaches on common datasets and on highly non-uniform meshes. Furthermore, when the training set is limited in size, the accuracy improvement over the state-of-the-art methods is highly evident.

Hence, this paper makes three contributions:

  1. We propose a novel representation of meshes for neural networks: random walks on surfaces.

  2. We present an end-to-end learning framework that realizes this representation within RNNs. We show that this framework works well even when the dataset is very small. This is important in the case of 3D, where large datasets are seldom available and are difficult to generate.

  3. We demonstrate the benefits of our method in two key applications: 3D shape classification and semantic segmentation.

2. Related Work

Our work is at the crossroads of three fields, as discussed below.

2.1. Representing 3D objects for Deep Neural Networks

A variety of representations of 3D shapes have been proposed in the context of deep learning. The main challenge is how to re-organize the shape description such that it could be processed within deep learning frameworks. Hereafter we briefly review the main representations; see (Gezawa et al., 2020) for a recent excellent survey.

Multi-view 2D projections.

This representation is essentially a set of 2D images, each of which is a rendering of the object from a different viewpoint (Su et al., 2015; Kalogerakis et al., 2017; Qi et al., 2016; Sarkar et al., 2018; Gomez-Donoso et al., 2017; Johns et al., 2016; Zanuttigh and Minto, 2017; Bai et al., 2016; Wang et al., 2019b; Kanezaki et al., 2018; Feng et al., 2018; He et al., 2018; Han et al., 2019). The major benefit of this representation is that it can naturally utilize any image-based CNN. In addition, high-resolution inputs can be easily handled. However, it is not easy to determine the optimal number of views; if that number is large, the computation might be costly. Furthermore, self-occlusions might be a drawback.

Volumetric grids.

These grids are analogous to the 2D grids of images. Therefore, the main benefit of this representation is that operations that are applied on 2D grids can be extended to 3D in a straightforward manner (Wu et al., 2015; Brock et al., 2016; Tchapmi et al., 2017; Fanelli et al., 2011; Maturana and Scherer, 2015; Wang et al., 2019a; Sedaghat et al., 2016a; Zhi et al., 2018). The primary drawbacks of volumetric grids are their limited resolution and the heavy computation cost needed.

Point clouds.

This representation consists of a set of 3D points, sampled from the object’s surface. The simplicity, close relationship to data acquisition, and the ease of conversion from other representations, make point clouds an attractive representation. Therefore, a variety of recent works proposed successful techniques for point cloud shape analysis using neural networks (Qi et al., 2017a, b; Wang et al., 2019d; Guerrero et al., 2018; Williams et al., 2019; Atzmon et al., 2018; Li et al., 2018; Liu et al., 2019; Xu et al., 2019; Zhu et al., 2019). These methods attempt to learn a representation for each point, using its neighbors (Euclidean-wise) either by multi layer perceptions or by convolutional layers. Some also define novel pooling layers. Point cloud representations might fall short in applications when the connectivity is highly meaningful (e.g. segmentation) or when the salient information is concentrated in small specific areas.

Triangular meshes.

This representation is the most widespread representation in computer graphics and the focus of our paper. The major challenge of using meshes within deep learning frameworks is the irregularity of the representation—each vertex has a different number of neighbors, at different distances.

The pioneering work of (Masci et al., 2015) introduces deep learning of local features and shows how to make the convolution operations intrinsic to the mesh. In (Poulenard and Ovsjanikov, 2018) a new convolutional layer is defined, which allows the propagation of geodesic information throughout the network layers. FeaStNet (Verma et al., 2018) proposes a graph neural network in which the neighborhood of each vertex for the convolution operation is calculated dynamically based on its features. Another line of works exploits the fact that local patches are approximately Euclidean. The 3D manifolds are then parameterized in 2D, where standard CNNs are utilized (Henaff et al., 2015; Sinha et al., 2016; Boscaini et al., 2016; Maron et al., 2017; Ezuz et al., 2017; Haim et al., 2019). A different approach is to apply a linear map to a spiral of neighbors (Gong et al., 2019; Lim et al., 2018), which works well for meshes with a similar graph structure.

(a) walks on the surface (b) Classification: Samples from the class the input belongs to (c) Semantic segmentation
Figure 2. Outline. To explore a mesh, walks on its surface are generated and study the surface both locally and globally (a). These walks provide sufficient information to perform shape analysis tasks, such as classification and segmentation. Specifically, (b) shows samples from the class to which MeshWalker correctly classified the model from (a) and (c) shows the resulting segmentation. The models are from SHREC11 (Lian et al., 2011).

Two approaches were recently introduced: MeshNet (Feng et al., 2019) treats faces of a mesh as the basic unit and extracts their spatial and structural features individually, to offer the final semantic representation. MeshCNN (Hanocka et al., 2019) is based on a very unique idea of using the edges of the mesh to perform pooling and convolution. The convolution operations exploit the regularity of edges—having edges of their incidental triangles. An edge collapse operation is used for pooling, which maintains surface topology and generates new mesh connectivity for further convolutions.

2.2. Classification

Object classification refers to the task of classifying a given shape into one of pre-defined categories. Before deep learning methods became widespread, the main challenges were finding good descriptors and good distance functions between these descriptors. According to the thorough review of (Lian et al., 2013), the methods could be roughly classified into algorithms employing local features (Johnson and Hebert, 1999; Lowe, 2004; Liu et al., 2006; Sun et al., 2009; Ovsjanikov et al., 2009), topological structures (Hilaga et al., 2001; Sundar et al., 2003; Tam and Lau, 2007), isometry-invariant global geometric properties (Reuter et al., 2005; Jain and Zhang, 2007; Mahmoudi and Sapiro, 2009),

direct shape matching, or canonical forms (Mémoli and Sapiro, 2005; Mémoli, 2007; Bronstein et al., 2006; Elad and Kimmel, 2003).

Many of the recent techniques already use deep learning for classification. They are described in Section 2.1, for instance (Hanocka et al., 2018; Qi et al., 2017a, b; Li et al., 2018; Ezuz et al., 2017; Bronstein et al., 2011; Feng et al., 2019; Thomas et al., 2019; Liu et al., 2019; Veličković et al., 2017; Wang et al., 2019c; Kipf and Welling, 2016; Perozzi et al., 2014).

2.3. Semantic segmentation

Mesh segmentation is a key ingredient in many computer graphics tasks, including modeling, animation and a variety of shape analysis tasks. The goal is to determine, for the basic elements of the mesh (vertex, edge or face), to which segment they belong. Many approaches were proposed, including region growing (Chazelle et al., 1997; Lavoué et al., 2005; Zhou and Huang, 2004; Koschan, 2003; Sun et al., 2002; Katz et al., 2005), clustering (Shlafman et al., 2002; Katz and Tal, 2003; Gelfand and Guibas, 2004; Attene et al., 2006b), spectral analysis (Alpert and Yao, 1995; Gotsman, 2003; Liu and Zhang, 2004; Zhang and Liu, 2005) and more. See (Attene et al., 2006a; Shamir, 2008; Rodrigues et al., 2018) for excellent surveys of segmentation methods.

Lately, deep learning has been utilized for this task as well. Each proposed approach handles a specific shape representation, as described in Section 2.1. These approaches include among others  (Hanocka et al., 2018; Qi et al., 2017a, b; Li et al., 2018; Yang et al., 2020; Haim et al., 2019; Maron et al., 2017; Qi et al., 2017b; Guo et al., 2015).

3. MeshWalker outline

Imagine an ant walking on a surface; it will ”climb” on ridges and go through valleys. Thus, it will explore the local geometry of the surface, as well as the global terrain. Random walks have been shown to incorporate both global and local information about a given object (Lai et al., 2008; Lovász, 1993; Grady, 2006; Noh and Rieger, 2004). This information may be invaluable for shape analysis tasks, nevertheless, random walks have not been used to represent meshes within a deep learning framework before.

Given a polygonal mesh, we propose to randomly walk through the vertices of the mesh, along its edges, as shown in Fig. 2(a). In our ant analogy, the longer the walk, the more information is acquired by the ant. But how shall this information be accumulated? We propose to feed this representation into a Recurrent Neural Network (RNN) framework, which aggregates properties of the walk. This aggregated information will enable the ant to perceive the shape of the mesh. This is particularly beneficial for shape analysis tasks that require both the 3D global structure and some local information of the mesh, as demonstrated in Fig. 2(b-c).

Algorithm 1 describes the training procedure of our proposed MeshWalker approach. A defining property of it is that the same piece of algorithm is used for every vertex along the walk (i.e., each vertex the ant passes through). The algorithm iterates on the following: A mesh is first extracted from the dataset (it could be a mesh that was previously extracted). A vertex is chosen randomly as the head of the walk and then a random walk is generated. This walk is the input to an RNN model. Finally, the RNN model’s parameters  are updated by minimizing the Softmax cross entropy loss , using Adam optimizer (Kingma and Ba, 2014).

Input: Labeled mesh dataset,
Output: —-RNN model parameters
random parameters;
MeshPreprocessing();
repeat
      random mesh and label(s) ;
      random starting vertex;
      ;
      ;
      ;
     
      until Convergence;
ALGORITHM 1 MeshWalker Training

Section 4 elaborates on the architecture of our MeshWalker learning model, as well as on each of the ingredients of the iterative step. Section 6.2 explains the mesh pre-processing step, which essentially performs mesh simplification, and provides implementation details.

4. Learning to walk over a surface

This section explains how to realize Algorithm 1. It begins by elaborating on the construction of a random walk on a mesh. It then proceeds to describe the network that learns from walks in order to understand meshes.

Figure 3. Network architecture. The network consists of three components: The first component (FC layers) changes the feature space; the second component (RNN layers) aggregates the information along the walk; and the third component (an FC layer) predicts the outcome of the network. For classification, the prediction of the last vertex of the walk is considered and Softmax is applied to its resulting vector (the bottom-right orange circle, classified as a camel). For segmentation (not shown in this figure), the network is similar. However, Softmax is applied to each of the resulting vectors of the vertices (the orange circles in the right column); each vertex is classified into a segment.

4.1. What is a walk?

Walks provide a novel way to organize the mesh data. A walk is a sequence of vertices (not necessarily adjacent), each of which is associated with basic information.

Walk generation.

We adopt a very simple strategy to generate walks, out of many possible ones. Recall that we are given the first vertex of a walk. Then, to generate the walk , the other vertices are iteratively added, as follows. Given the current vertex of the walk, the next vertex is chosen randomly from its adjacent vertices (those that belong to its one-ring neighbors).

If such a vertex does not exist (as all the neighbors already belong to the walk), the walk is tracked backwards until an un-visited neighbor is found; this neighbor is added to the walk. In this case, the walk is not a linear sequence of vertices connected via edges, but rather a tree. If the mesh consists of multiple connected component, it is possible that the walk reaches a dead-end. In this case, a new random un-visited vertex is chosen and the walk generation proceeds as before. We note that in all cases, the input to the RNN is a sequence of vertices, arranged by their discovery order. In practice, the length of the walk is set by default to , where is number of vertices.

Walk representation.

Once the walk is determined, the representation of this walk should be defined; this would be the input to the RNN. Each vertex is represented as the 3D translation from the previous vertex in the walk (). This is inline with the deep learning philosophy, which prefers end-to-end learning instead of hand-crafted features that are separated from a classifier, We note that we also tried other representations, including vertex coordinates, normals, and curvatures, but the results did not improve.

Walks at inference time.

At inference, several walks are being used for each mesh. Each walk produces a vector of probabilities to belong to the different classes (in the case of classification). These vectors are averaged to produce the final result. To understand the importance of averaging, let us consider the walks on the camel in Fig. 1. Since walks are generated randomly, we expect some of them to explore atypical parts of the model, such as the legs, which are similar to horse legs. Other walks, however, are likely to explore unique parts, such as the hump or the head. The average result will most likely be the camel, as will be shown in Section 5.

4.2. Learning from walks

Once walks are defined, the next challenge is to distillate the information accumulated along a walk into a single descriptor vector. Hereafter we discuss the network architecture and the training.

Network architecture.

The model consists of three sub-networks, as illustrated in Fig. 3. The first sub-network is given the current vertex of the walk and learns a new feature space, i.e. it transforms the 3D input feature space into a 256D feature space. This is done by two fully connected (FC) layers, followed by an instance normalization (Ulyanov et al., 2016) layer and ReLu as nonlinear activation; both empirically outperform other alternatives.

The second sub-network is the core of our approach. It utilizes a recurrent neural network (RNN) whose defining property is being able to ”remember” and accumulate knowledge. Briefly, a recurrent neural network (Graves et al., 2008; Hochreiter and Schmidhuber, 1997; Cho et al., 2014) is a connectionist model that contains a self-connected hidden layer. The benefit of self-connection is that the ‘memory’ of previous inputs remains in the network’s internal state, allowing it to make use of past context. In our setting, the RNN gets as input a feature vector (the result of the previous sub-network), learns the hidden states that describe the walk up to the current vertex, and outputs a state vector that contains the information gathered along the walk.

Another benefit of RNNs, which is crucial in our case, is not being confined to fixed-length inputs or outputs. Thus, we can use the model to inference on a walk of a certain length, which may differ from walk lengths the model was trained on.

To implement the RNN part of our model, we use three Gated Recurrent Unit (GRU) layers of (Cho et al., 2014). Briefly, the goal of an GRU layer is to accumulate only the important information from the input sequence and to forget the non-important information.

Formally, let be the input at time and be the hidden state at time ; let the reset gate and the update gate be two vectors, which jointly decide which information should be passed from time - to time . To realize GRU’s goal, the network performs the following calculation, which sets the hidden state at time . Its final content is based on updating the hidden state in the previous time (the update gate determines which information should be passed) and on its candidate memory content :

(1)

where is an element-wise multiplication. Here, is defined as:

(2)

That is, when the reset gate is close to , the hidden state ignores the previous hidden state and resets with the current input only. This effectively allows the hidden state to drop any information that will later be found to be irrelevant.

Finally, the reset gate and the update gate are defined as:

(3)
(4)

where is a logistic Sigmoid function. and are trainable weight matrices and are trainable bias vectors. The initial hidden state is set to .

GRU outperforms a vanilla RNN, due to its ability to both remember the important information along the sequence and to forget unimportant content. Furthermore, it is capable of processing long sequences, similarly to the Long Short-Term Memory (LSTM) (Hochreiter and Schmidhuber, 1997). Being able to accumulate information from long sequences is vital for grasping the shape of a 3D model, which usually consists of thousands of vertices. We chose GRU over LSTM due to its simplicity and its smaller computational requirements. For comparison, LSTM would require trainable parameters in our case, whereas uses . Furthermore, the inference time is smaller—for instance, a single -steps walk takes using LSTM and using GRU.

The third sub-network in Fig. 3 predicts the object class in case of classification, or the vertex segment in case of semantic segmentation. It consists of a single fully connected (FC) layer on top of the state vector calculated in the previous sub-network. More details on the architectures & the implementation are given in Section 6.

Loss calculation.

The Softmax cross entropy loss is used on the output of the third part of the network. In the case of the classification task, only the last step of the walk is used as input to the loss function, since it accumulates all prior information from the walk. In Fig. 3, this is the bottom-right orange component.

In the case of the segmentation task, each vertex has its own predicted segment class. Each of the orange components in Fig. 3 classifies the segment that the respected vertex belongs to. Since at the beginning of the walk the results are not trustworthy (as the mesh is not yet well understood), for the loss calculation in the training process we consider the segment class predictions only for the vertices that belong to the second half of the walk.

5. Applications: Classification & Segmentation

MeshWalker is a general approach, which may be applied to a variety of applications. We demonstrate its performance for two fundamental tasks in shape analysis: mesh classification and mesh semantic segmentation. Our results are compared against the reported SOTA results for recently-used datasets, hence the methods we compare against vary according to the specific dataset.

5.1. Mesh classification

Given a mesh, the goal is to classify it into one of pre-defined classes. For the given mesh we generate multiple random walks. These walks are run through the trained network. For each walk, the network predicts the probability of this mesh to belong to each class. These prediction vectors are averaged into a single prediction vector. In practice we use walks; Section 6 will discuss the robustness of MeshWalker to the number of walks.

To test our algorithm, we applied our method to three recently-used datasets: SHREC11 (Lian et al., 2011), engraved cubes  (Hanocka et al., 2019) and ModelNet40 (Wu et al., 2015), which differ from each other in the number of classes, the number of objects per class, as well as the type of shapes they contain. As common, the accuracy is defined as the ratio of correctly predicted meshes.

Shrec11.

This dataset consists of classes, with 20 examples per class. Typical classes are camels, cats, glasses, centaurs, hands etc. Following the setup of (Ezuz et al., 2017), we split the objects in each class into (/) training examples and (/) testing examples.

Table 1 compares the performance, where each result is the average of the results of randoms splits (of or of ). When the split is objects for training and for testing, the advantage of our method is apparent. When objects are used for training and only for testing, we get the same accuracy as that of the current state-of-the-art. In Section 6.1 we show that indeed the smaller the training dataset, the more advantageous our approach is.

Method Input Split-6 Split-
MeshWalker (ours) Mesh 98.6% 97.1%
MeshCNN (Hanocka et al., 2019) Mesh 98.6% 91.0%
GWCNN (Ezuz et al., 2017) Mesh 96.6% 90.3%
SG (Bronstein et al., 2011) Mesh 70.8% 62.6%
Table 1. Classification on SHREC11 (Lian et al., 2011). Split- and Split- are the number of training models per class (out of models in the class). In both cases our method achieves state-of-the-art results, yet it is most advantageous for a small training dataset (Split-). (We have not found point cloud-based networks that were tested on SHREC11).

Cube engraving.

This dataset contains objects, with / training/testing split. Each object is a cube ”engraved” with a shape at a random face in a random location, as demonstrated in Fig. 4. The engraved shape belongs to a dataset of classes (e.g., car, heart, apple, etc.), each contains roughly shapes. This dataset was created in order to demonstrate that using meshes, rather than point clouds, may be critical for 3D shape analysis.

Table 2 provides the results. It demonstrates the benefit of our method over state-of-the-art methods.

Figure 4. Engraved cubes dataset. This image is courtesy of (Hanocka et al., 2019).
Method Input accuracy
MeshWalker (ours) Mesh 98.6%
MeshCNN (Hanocka et al., 2019) Mesh 92.16%
PointNet++ (Qi et al., 2017b) Point cloud 64.26%
Table 2. Classification on Cube Engraving (Hanocka et al., 2019). Our results outperform those of state-of-the-art algorithms.

ModelNet40.

This commonly-used dataset contains CAD models from categories, out of which models are used for training and models are used for testing. Unlike previous datasets, many of the objects contain multiple components and are not necessarily watertight, making this dataset prohibitive for some mesh-based methods. However, such models can be handled by MeshWalker since as explained before, if the walk gets into a dead-end during backtracking, it jumps to a new random location.

Table 3 shows that our results outperform those of mesh-based state-of-the-art methods. We note that without classes that are cross-labeled (desk/table & plant/flower-pot/vase) our method’s accuracy is . The table shows that multi-views approaches are excellent for this dataset. This is due to relying on networks that are pre-trained on a large number of images. However, they might fail for other datasets, such as the engraved cubes, and do not suit other shape analysis tasks, such as semantic segmentation.

Method Input Accuracy
MeshWalker (ours) mesh 92.3%
MeshNet (Feng et al., 2019) mesh 91.9%
SNGC (Haim et al., 2019) mesh 91.6%
KPConv (Thomas et al., 2019) point cloud 92.9%
PointNet (Qi et al., 2017a) point cloud 89.2%
RS-CNN (Liu et al., 2019) point cloud 93.6%
RotationNet (Kanezaki et al., 2018) multi-views 97.3%
GVCNN (Feng et al., 2018) multi-views 93.1%
3D2SeqViews (Han et al., 2019) multi-views 93.4%
Table 3. Classification on ModelNet40 (Wu et al., 2015). MeshWalker is competitive with other mesh-based methods. Multi-view methods are advantageous for this dataset, possibly due to relying on pre-trained networks for image classification and to naturally handling multiple components and non–watertight models, which characterize many meshes in this dataset.
(a) Ours (b) (Hanocka et al., 2019) (c) Ours (d) (Hanocka et al., 2019)
Figure 5. Qualitative results for human shape segmentation from (Maron et al., 2017). Our system avoids mis-classifications, not mixing lower legs with lower arms or hands with feet. We note that for most shapes in the dataset, both systems produce equally-good results.

5.2. Mesh semantic segmentation

Shape segmentation is an important building block for many applications in shape analysis and synthesis. The goal is to determine, for every vertex, the segment it belongs to. We tested MeshWalker on two datasets: COSEG (Wang et al., 2012) and human-body Segmentation (Maron et al., 2017).

Given mesh, multiple random walks are generated (in practice, # segment classes; see the discussion in Section 6). These walks are run through the trained network, which predicts the probabilities of belonging to the segments. Similarly to the training process, only vertices of the second half of each walk are considered trustworthy. For each vertex, the predictions of the walks it belongs to are averaged. Then, as post-processing, we consider the average prediction of the vertex neighbors and add this average with weight. Finally, the prediction for each vertex is the argmax-ed.

Formally, let be the set of walks performed on a mesh. Let be the vector that is the Softmax output for vertex from walk  (if walk  does not visit , is set to a -vector). Let be the list of the vertices adjacent to and be the size of this list. The predicted label, of vertex is defined as (where finds the maximum vector entry):

(5)

We follow the accuracy measure proposed in (Hanocka et al., 2019): Given the prediction for each edge, the accuracy is defined as the percentage of the correctly-labeled edges, weighted by their length. Since MeshWalker predicts the segment of the vertices, if the predictions of the endpoints of the edge agree, the edge gets the endpoints’ label; otherwise, the label with the higher prediction is chosen. The overall accuracy is the average over all meshes.

Human-body segmentation.

The dataset consists of training models from SCAPE (Anguelov et al., 2005), FAUST (Bogo et al., 2014), MIT (Vlasic et al., 2008) and Adobe Fuse (Adobe, 2016). The test set consists of humans from SHREC’07 (Giorgi et al., 2007) . The meshes are manually segmented into eight labeled segments according to (Kalogerakis et al., 2010).

Method Edge Accuracy
MeshWalker 94.8%
MeshCNN 92.3%
Table 4. Human-body segmentation results on (Maron et al., 2017). The accuracy is calculated on edges of the simplified meshes.
Method Input Face
Accuracy
MeshWalker (ours) Mesh 92.7%
MeshCNN (Hanocka et al., 2019) Mesh 89.0%
LRF-Conv (Yang et al., 2020) Mesh 89.9%
SNGC (Haim et al., 2019) Mesh 91.3%
Toric Cover (Maron et al., 2017) Mesh 88.0%
GCNN (Masci et al., 2015) Mesh 86.4%
MDGCNN Mesh 89.5%
 (Poulenard and Ovsjanikov, 2018)
PointNet++ (Qi et al., 2017b) Point cloud 90.8%
DynGraphCNN (Wang et al., 2019d) Point cloud 89.7%
Table 5. Human-body segmentation results on (Maron et al., 2017). The reported results are on the original meshes; For MeshCNN, the results shown are ours. Our results outperform those of state-of-the-art algorithms.

There are two common measures of segmentation results, according to the correct classification of faces (Haim et al., 2019) or of edges (Hanocka et al., 2019). Tables 4 and 5 compare our results to those of previous works, according to the reported measure and the type of objects (simplified or not). Since our method is trained on simplified meshes, to get results on the original meshes, we apply a simple projection to the original meshes jointly with boundary smoothing, as in (Katz and Tal, 2003). In both measures, MeshWalker outperforms other methods. Fig. 5 presents qualitative examples where the difference between the resulting segmentations is evident.

(a) Vases (b) Aliens (c) Chairs
Figure 6. Qualitative results of segmentation for meshes from COSEG (Wang et al., 2012).

COSEG segmentation.

This dataset contains three large classes: aliens, vases and chairs with , and shapes, respectively. Each category is split into / train/test sets. Fig. 6 presents some qualitative results, where it can be seen that our method performs very well. Table 6 shows the accuracy of our results, where the results of the competitors are reported in (Hanocka et al., 2019). Our method achieves state-of-the-art results for all categories.

Method Vases Chairs Telealiens Mean
MeshWalker (ours) 98.7% 99.6% 99.1% 99.1%
MeshCNN 97.3% 99.6% 97.6% 98.2%
PointNet++ 94.7% 98.9% 79.1% 90.9%
PointCNN (Li et al., 2018) 96.4% 99.3% 97.4% 97.7%
Table 6. Segmentation results on COSEG (Wang et al., 2012). Our method achieves state-of-the-art results for all categories.

6. Experiments

6.1. Ablation study

Size of the training dataset.

How many training models are needed in order to achieve good performance? In the 3D case this question is especially important, since creating a dataset is costly. Table 7 shows the accuracy of our model for the COSEG dataset, when trained on different dataset sizes. As expected, the larger the dataset, the better the results. However, even when using only shapes for training, the results are pretty good (). This outstanding result can be explained by the fact that we can produce many random walks for each mesh, hence the actual number of training examples is large. This result is consistent across all categories and datasets. Table 8 shows a similar result for the human-body segmentation dataset.

# training shapes Vases Chairs Tele-aliens Mean
Full
32
16
8
4
2
1
Table 7. Analysis of the training dataset size (COSEG segmentation). ”Full” training is , and shapes for tele-aliens, vases and chairs, respectively. As expected, the larger the dataset, the better the results. However, even if the training dataset is very small, our results are good.
# training shapes MeshWalker MeshCNN
(ours) (Hanocka et al., 2019)
381 (full) 94.8%
16 92.0%
4 84.3%
2 80.8%
Table 8. Analysis of the training dataset size (human-body segmentation). As before, the performance of our method degrades gracefully with the size of the training set. We note that the results of MeshCNN are not reported in their paper, but rather the results of new runs of their system.

Walk length.

Fig. 1 has shown that the accuracy of our method depends on the walk length. What would be an ideal length for our system to ”understand” a shape? Fig. 7 analyzes the influence of the length on the task of classification for SHREC11. As expected, the accuracy increases with length. However, it can be seen that when we use at least walks per mesh, a walk whose length is suffices to get excellent results. Furthermore, there is a trade-off between the number of walks we use and the length of these walks. Though the exact length depends both on the task in hand and on the dataset, this correlation is consistent across datasets and tasks.

Figure 7. Walk length analysis. The accuracy increases with walk length, for classification on SHREC11. Here, the axis is number of vertices along the walk, normalized by number of mesh vertices. This figure illustrates trade-off between the number of walks we use and the length of these walks. As the walk begins, using many walks is not beneficial since the RNN has not accumulated enough information yet. However, after e.g. 0.3V, two walks are better than a single 0.6V-length walk. This is because they explore different mesh regions.

Number of walks.

How many walks are needed at inference time? Table 9 shows that the more walks, the better the accuracy. However, even very few walks result in very good accuracy. In particular, on SHREC11, even with a single walk the accuracy is . For the Engraved-Cubes dataset, more walks are needed, since the model is engraved on a single cube facet, which certain walks might not get to. Even in this difficult case, walks already achieve accuracy. We note that the STD is between for a single walk to for walks. As expected, the more walks used, the more stable the results are and the smaller the STD is.

# Walks SHREC11 Acc Eng.Cubes Acc
%
%
%
%
%
%
Table 9. Number of walks analysis. The accuracy improves with the number of walks per shape (demonstrated on datasets).

Robustness.

We use various rotations within data augmentation, hence robustness to orientations. In particular, to test the robustness to rotation, we rotated the models in the Human-body segmentation dataset and in SHREC11 classification dataset times for each axis, by increments of . For each of these rotated versions of the datasets we applied the same testing as before. For both datasets, there was no difference in the results. Furthermore, the meshes are normalized, hence robustness to scaling.

Our approach is inherently robust to different triangulations, as random walks (representing the same mesh) may vary greatly anyhow. Specifically, we generated a modified version of the COSEG segmentation dataset by randomly perturbing of the vertex positions, realized as a shift towards a random vertex in its -ring. The performance degradation is less than .

(a) input (b) FC1 (c) FC2 (d) GRU1 (e) GRU2 (f) GRU3
Figure 8. t-SNE of the internal layers. This is a visualization of the output of the different layers for the human-body segmentation task. It can be seen how the semantic meaning of the layers’ output starts to evolve after the first GRU layer and gets better in the next two layers.

6.2. Implementation

Mesh pre-processing: simplification & data augmentation.

All the meshes used for training are first simplified into roughly the same number of faces (Garland and Heckbert, 1997; Hoppe, 1997) (MeshProcessing procedure in Algorithm 1). Simplification is analogous to the initial resizing of images. It reduces the network capacity required for training. Moreover, we could use several simplifications for each mesh as a form of data augmentation for training and for testing. For instance, for ModelNet40 we use , and faces. The meshes are normalized into a unit sphere, if necessary.

In addition, we augment the training data and add diversity by rotating the models. As part of batch preparation, each model is randomly rotated in each axis prior to each training iteration.

t-SNE analysis.

Does the network produce meaningful features? Fig. 8 opens the network’s ”black box” and shows the t-SNE projection to 2D of the multi-dimensional features after each stage of our learning framework, applied to the human-body segmentation task. Each feature vector is colored by its correct label.

In the input layer all the classes are mixed together. The same behavior is noticed after the first two fully-connected layers, since no information is shared between the vertices up to this stage. In the next three GRU layers, semantic meaning evolves: The features are structured as we get deeper in the network. In the last RNN layer the features are meaningful, as the clusters are evident. This visualization demonstrates the importance of the RNN hierarchy.

Fig. 9 reveals another invaluable property of our walks. It shows the t-SNE visualization of walks for classification of objects from categories of SHREC11. Each feature vector is colored by its correct label; its shape (rectangle, triangle etc) represents the object the walk belongs to. Not only clusters of shapes from the same category clearly emerge, but also walks that belong to the same object are grouped together! This is another indication to the quality of our proposed features.

Computation time.

Training takes between hours (for classification on SHREC11) to hours (for segmentation on human-body), using GTX 1080 TI graphics card. At inference, a -step walk, which is typical for SHREC11, takes about milliseconds. When we use walks per shape, the running time would be milliseconds. Remeshing takes e.g. seconds from faces to or from face to faces. We note that our method is easy to parallelize, as every walk could be processed on a different processor, which is yet another benefit of our approach.

Figure 9. t-SNE analysis for classification. This figure shows feature hierarchy: Meshes that belong to the same category (indicated by the color) are clustered together. Furthermore, walks that belong to the same mesh (indicated by the shape of the 2D point) are also clustered.

Training configurations.

We implemented our network using TensorFlow V2. The network architecture is given in Table 10. The source code is available on “https://github.com/AlonLahav/MeshWalker”.

Layer Output Dimension
Vertex description
Fully Connected
Instance Normalization
ReLU
Fully Connected
Instance Normalization
ReLU
GRU
GRU
GRU
Fully Connected # of classes
Table 10. Training configuration

Optimization: To update the network weights, we use Adam optimizer (Kingma and Ba, 2014). The learning rate is set in a cyclic way, as suggested by (Smith, 2017). The initial and the maximum learning rates are set to and respectively. The cycle size is iterations.

Batch strategy: Walks are grouped into batches of walks each. For mesh classification, the walks are generated from different meshes, whereas for semantic segmentation each batch is composed of walks on meshes.

Training iterations: We train for k, k, k, k, k iterations for SHREC11, COSEG, human-body segmentation, engraved-cubes and ModelNet40 datasets, respectively. This is so since for the loss to converge fast, many of the walks should cover the salient parts of the shape, which distinguish it from other classes/segments. When this is not the case, more iterations are needed in order for the few meaningful walks to influence the loss. This is the case for instance in the engraved cubes dataset, where the salient information lies on a single facet.

6.3. Limitations

Fig. 10 shows a failure of our algorithm, where parts of the hair were wrongly classified as a torso. This is the case since the training data does not contain enough models with hair to learn from. In general, learning-based algorithms rely on good training data, which is not always available.

(a) Ground truth (b) Ours (c) (Hanocka et al., 2019)
Figure 10. Limitation. Our algorithm fails to classify the hair due to not having sufficient similar shapes in the dataset.

Another limitation is handling large meshes. The latter require long walks, which in turn might lead to run-time and memory issues. In this paper, this is solved by simplifying the meshes and then projecting the segmentation results onto the original meshes. (For classification, this is not a concern, as simplified meshes may be used).

7. Conclusion

This paper has introduced a novel approach for representing meshes within deep learning schemes. The key idea is to represent the mesh by random walks on its surface, which intuitively explore the shape of the mesh. Since walks are described by the order of visiting mesh vertices, they suit deep learning.

Utilizing this representation, the paper has proposed an end-to-end learning framework, termed MeshWalker. The random walks are fed into a Recurrent Neural Network (RNN), that ”remembers” the walk’s history (i.e. the geometry of the mesh). Prior works indicated that RNNs are unsuitable for point clouds due to both the unordered nature of the data and the number of vertices used to represent a shape. Surprisingly, we have shown that RNNs work extremely well for meshes, through the concept of random walks.

Our approach is general, yet simple. It has several additional benefits. Most notably, it works well even for extremely small datasets. e.g. even meshes per class suffice to get good results. In addition, the meshes are not require to be watertight or to consist of a single component (as demonstrated by ModelNet40 (Wu et al., 2015)); some other mesh-based approaches impose these conditions and require the meshes to be manifolds.

Last but not least, the power of this approach has been demonstrated for two key tasks in shape analysis: mesh classification and mesh semantic segmentation. In both cases, we present state-of-the-art results.

An interesting question for future work is whether there are optimal walks for meshes, rather than random walks. For instance, are there good starting points of walks? Additionally, reinforcement learning could be utilized to learn good walks. Exploring other applications, such as shape correspondence, is another intriguing future direction. Another interesting practical future work would be to work on the mesh as is, without simplification as pre-processing.

Acknowledgements.
We gratefully acknowledge the support of the Israel Science Foundation (ISF) 1083/18 amd PMRI – Peter Munk Research Institute – Technion.

Footnotes

  1. copyright: none

References

  1. Adobe fuse 3d characters. Note: \urlhttps://www.mixamo.com Cited by: §5.2.
  2. Spectral partitioning: the more eigenvectors, the better. In Proceedings of the 32nd annual ACM/IEEE Design Automation Conference, pp. 195–200. Cited by: §2.3.
  3. SCAPE: shape completion and animation of people. In ACM SIGGRAPH 2005 Papers, pp. 408–416. Cited by: §5.2.
  4. Mesh segmentation - a comparative study. In IEEE International Conference on Shape Modeling and Applications 2006 (SMI’06), Vol. , pp. 7–7. Cited by: §2.3.
  5. Hierarchical mesh segmentation based on fitting primitives. The Visual Computer 22 (3), pp. 181–193. Cited by: §2.3.
  6. Point convolutional neural networks by extension operators. arXiv preprint arXiv:1803.10091. Cited by: §1, §2.1.
  7. Gift: a real-time and scalable 3d shape search engine. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5023–5032. Cited by: §2.1.
  8. 3dmfv: three-dimensional point cloud classification in real-time using convolutional neural networks. IEEE Robotics and Automation Letters 3 (4), pp. 3145–3152. Cited by: §1.
  9. FAUST: dataset and evaluation for 3d mesh registration. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3794–3801. Cited by: §5.2.
  10. Learning shape correspondence with anisotropic convolutional neural networks. In Advances in neural information processing systems, pp. 3189–3197. Cited by: §2.1.
  11. Unstructured point cloud semantic labeling using deep segmentation networks.. 3DOR 2, pp. 7. Cited by: §1.
  12. Generative and discriminative voxel modeling with convolutional neural networks. arXiv preprint arXiv:1608.04236. Cited by: §2.1.
  13. Shape google: geometric words and expressions for invariant shape retrieval. ACM Transactions on Graphics (TOG) 30 (1), pp. 1–20. Cited by: §2.2, Table 1.
  14. Efficient computation of isometry-invariant distances between surfaces. SIAM Journal on Scientific Computing 28 (5), pp. 1812–1836. Cited by: §2.2.
  15. Strategies for polyhedral surface decomposition: an experimental study. Computational Geometry 7 (5-6), pp. 327–342. Cited by: §2.3.
  16. Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078. Cited by: §4.2, §4.2.
  17. On bending invariant signatures for surfaces. IEEE Transactions on pattern analysis and machine intelligence 25 (10), pp. 1285–1295. Cited by: §2.2.
  18. GWCNN: a metric alignment layer for deep shape analysis. In Computer Graphics Forum, Vol. 36, pp. 49–57. Cited by: §2.1, §2.2, §5.1, Table 1.
  19. Real time head pose estimation from consumer depth cameras. In Joint pattern recognition symposium, pp. 101–110. Cited by: §2.1.
  20. GVCNN: group-view convolutional neural networks for 3d shape recognition. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §1.
  21. GVCNN: group-view convolutional neural networks for 3d shape recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 264–272. Cited by: §2.1, Table 3.
  22. MeshNet: mesh neural network for 3d shape representation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, pp. 8279–8286. Cited by: §1, §2.1, §2.2, Table 3.
  23. Surface simplification using quadric error metrics. In Proceedings of the 24th annual conference on Computer graphics and interactive techniques, pp. 209–216. Cited by: §6.2.
  24. Shape segmentation using local slippage analysis. In Proceedings of the 2004 Eurographics/ACM SIGGRAPH symposium on Geometry processing, pp. 214–223. Cited by: §2.3.
  25. A review on deep learning approaches for 3d data representations in retrieval and classifications. IEEE Access 8, pp. 57566–57593. Cited by: §2.1.
  26. Shape retrieval contest 2007: watertight models track. SHREC competition 8 (7). Cited by: §5.2.
  27. Lonchanet: a sliced-based cnn architecture for real-time 3d object recognition. In 2017 International Joint Conference on Neural Networks (IJCNN), pp. 412–418. Cited by: §2.1.
  28. Spiralnet++: a fast and highly efficient mesh convolution operator. In Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 0–0. Cited by: §2.1.
  29. On graph partitioning, spectral analysis, and digital mesh processing. In 2003 Shape Modeling International., pp. 165–171. Cited by: §2.3.
  30. Random walks for image segmentation. IEEE transactions on pattern analysis and machine intelligence 28 (11), pp. 1768–1783. Cited by: §3.
  31. A novel connectionist system for unconstrained handwriting recognition. IEEE transactions on pattern analysis and machine intelligence 31 (5), pp. 855–868. Cited by: §4.2.
  32. PCPNet learning local shape properties from raw point clouds. In Computer Graphics Forum, Vol. 37, pp. 75–85. Cited by: §2.1.
  33. 3d mesh labeling via deep convolutional neural networks. ACM Transactions on Graphics (TOG) 35 (1), pp. 1–12. Cited by: §2.3.
  34. Surface networks via general covers. In Proceedings of the IEEE International Conference on Computer Vision, pp. 632–641. Cited by: §2.1, §2.3, §5.2, Table 3, Table 5.
  35. 3d2seqviews: aggregating sequential views for 3d global feature learning by cnn with hierarchical attention aggregation. IEEE Transactions on Image Processing 28 (8), pp. 3986–3999. Cited by: §2.1, Table 3.
  36. Alignet: partial-shape agnostic alignment via unsupervised learning. ACM Transactions on Graphics (TOG) 38 (1), pp. 1–14. Cited by: §2.2, §2.3.
  37. MeshCNN: a network with an edge. ACM Transactions on Graphics (TOG) 38 (4), pp. 1–12. Cited by: §1, §2.1, Figure 4, Figure 5, §5.1, §5.2, §5.2, §5.2, Table 1, Table 2, Table 5, Figure 10, Table 8.
  38. Triplet-center loss for multi-view 3d object retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1945–1954. Cited by: §2.1.
  39. Deep convolutional networks on graph-structured data. arXiv preprint arXiv:1506.05163. Cited by: §2.1.
  40. Topology matching for fully automatic similarity estimation of 3d shapes. In Proceedings of the 28th annual conference on Computer graphics and interactive techniques, pp. 203–212. Cited by: §2.2.
  41. Long short-term memory. Neural computation 9 (8), pp. 1735–1780. Cited by: §4.2, §4.2.
  42. View-dependent refinement of progressive meshes. In Proceedings of the 24th annual conference on Computer graphics and interactive techniques, pp. 189–198. Cited by: §6.2.
  43. Pointwise convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 984–993. Cited by: §1.
  44. A spectral approach to shape-based retrieval of articulated 3d models. Computer-Aided Design 39 (5), pp. 398–407. Cited by: §2.2.
  45. Pairwise decomposition of image sequences for active multi-view recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3813–3822. Cited by: §2.1.
  46. Using spin images for efficient object recognition in cluttered 3d scenes. IEEE Transactions on pattern analysis and machine intelligence 21 (5), pp. 433–449. Cited by: §2.2.
  47. 3D shape segmentation with projective convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3779–3788. Cited by: §2.1.
  48. Learning 3d mesh segmentation and labeling. In ACM SIGGRAPH 2010 papers, pp. 1–12. Cited by: §5.2.
  49. Rotationnet: joint object categorization and pose estimation using multiviews from unsupervised viewpoints. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5010–5019. Cited by: §1, §2.1, Table 3.
  50. Mesh segmentation using feature point and core extraction. The Visual Computer 21 (8-10), pp. 649–658. Cited by: §2.3.
  51. Hierarchical mesh decomposition using fuzzy clustering and cuts. ACM Transactions on Graphics (TOG) 22 (3), pp. 954–961. Cited by: §2.3, §5.2.
  52. Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §3, §6.2.
  53. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907. Cited by: §2.2.
  54. Perception-based 3d triangle mesh segmentation using fast marching watersheds. In 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings., Vol. 2, pp. II–II. Cited by: §2.3.
  55. Fast mesh segmentation using random walks. In Proceedings of the 2008 ACM Symposium on Solid and Physical Modeling, SPM ’08, New York, NY, USA, pp. 183–191. External Links: ISBN 978-1-60558-106-4, Link, Document Cited by: §3.
  56. A new cad mesh segmentation method, based on curvature tensor analysis. Computer-Aided Design 37 (10), pp. 975–987. Cited by: §2.3.
  57. Pointcnn: convolution on x-transformed points. In Advances in neural information processing systems, pp. 820–830. Cited by: §1, §2.1, §2.2, §2.3, Table 6.
  58. Shape retrieval on non-rigid 3d watertight meshes. In Eurographics workshop on 3d object retrieval (3DOR), Cited by: Figure 1, Figure 2, §5.1, Table 1.
  59. A comparison of methods for non-rigid 3d shape retrieval. Pattern Recognition 46 (1), pp. 449–461. Cited by: §2.2.
  60. A simple approach to intrinsic correspondence learning on unstructured 3d meshes. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 0–0. Cited by: §2.1.
  61. Segmentation of 3d meshes through spectral clustering. In 12th Pacific Conference on Computer Graphics and Applications, 2004. PG 2004. Proceedings., pp. 298–305. Cited by: §2.3.
  62. Shape topics: a compact representation and new algorithms for 3d partial shape retrieval. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), Vol. 2, pp. 2025–2032. Cited by: §2.2.
  63. Relation-shape convolutional neural network for point cloud analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8895–8904. Cited by: §2.1, §2.2, Table 3.
  64. Random walks on graphs: a survey. Combinatorics, Paul erdos is eighty 2 (1), pp. 1–46. Cited by: §3.
  65. Distinctive image features from scale-invariant keypoints. International journal of computer vision 60 (2), pp. 91–110. Cited by: §2.2.
  66. Three-dimensional point cloud recognition via distributions of geometric distances. Graphical Models 71 (1), pp. 22–31. Cited by: §2.2.
  67. Convolutional neural networks on surfaces via seamless toric covers.. ACM Trans. Graph. 36 (4), pp. 71–1. Cited by: §2.1, §2.3, Figure 5, §5.2, Table 4, Table 5.
  68. Geodesic convolutional neural networks on riemannian manifolds. In Proceedings of the IEEE international conference on computer vision workshops, pp. 37–45. Cited by: §2.1, Table 5.
  69. Voxnet: a 3d convolutional neural network for real-time object recognition. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 922–928. Cited by: §1, §2.1.
  70. A theoretical and computational framework for isometry invariant recognition of point cloud data. Foundations of Computational Mathematics 5 (3), pp. 313–347. Cited by: §2.2.
  71. On the use of gromov-hausdorff distances for shape comparison. Cited by: §2.2.
  72. Random walks on complex networks. Physical review letters 92 (11), pp. 118701. Cited by: §3.
  73. Shape google: a computer vision approach to isometry invariant shape retrieval. In 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops, pp. 320–327. Cited by: §2.2.
  74. Deepwalk: online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 701–710. Cited by: §2.2.
  75. Multi-directional geodesic neural networks via equivariant convolution. ACM Transactions on Graphics (TOG) 37 (6), pp. 1–14. Cited by: §2.1, Table 5.
  76. Pointnet: deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 652–660. Cited by: §2.1, §2.2, §2.3, Table 3.
  77. Volumetric and multi-view cnns for object classification on 3d data. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5648–5656. Cited by: §2.1.
  78. Pointnet++: deep hierarchical feature learning on point sets in a metric space. In Advances in neural information processing systems, pp. 5099–5108. Cited by: §2.1, §2.2, §2.3, Table 2, Table 5.
  79. Laplace-spectra as fingerprints for shape matching. In Proceedings of the 2005 ACM symposium on Solid and physical modeling, pp. 101–106. Cited by: §2.2.
  80. Part-based mesh segmentation: a survey. In Computer Graphics Forum, Vol. 37, pp. 235–274. Cited by: §2.3.
  81. Classification of point cloud scenes with multiscale voxel deep network. arXiv preprint arXiv:1804.03583. Cited by: §1.
  82. Learning 3d shapes as multi-layered height-maps using 2d convolutional networks. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 71–86. Cited by: §2.1.
  83. Orientation-boosted voxel nets for 3d object recognition. arXiv preprint arXiv:1604.03351. Cited by: §2.1.
  84. Orientation-boosted voxel nets for 3d object recognition. CoRR abs/1604.03351. External Links: Link, 1604.03351 Cited by: §1.
  85. A survey on mesh segmentation techniques. In Computer graphics forum, Vol. 27, pp. 1539–1556. Cited by: §2.3.
  86. Metamorphosis of polyhedral surfaces using decomposition. In Computer graphics forum, Vol. 21, pp. 219–228. Cited by: §2.3.
  87. Deep learning 3d shape surfaces using geometry images. In European Conference on Computer Vision, pp. 223–240. Cited by: §2.1.
  88. Cyclical learning rates for training neural networks. In 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 464–472. Cited by: §6.2.
  89. Multi-view convolutional neural networks for 3d shape recognition. In Proceedings of the IEEE international conference on computer vision, pp. 945–953. Cited by: §1, §2.1.
  90. A concise and provably informative multi-scale signature based on heat diffusion. In Computer graphics forum, Vol. 28, pp. 1383–1392. Cited by: §2.2.
  91. Triangle mesh-based edge detection and its application to surface segmentation and adaptive surface smoothing. In Proceedings. International Conference on Image Processing, Vol. 3, pp. 825–828. Cited by: §2.3.
  92. Skeleton based shape matching and retrieval. In 2003 Shape Modeling International., pp. 130–139. Cited by: §2.2.
  93. Deformable model retrieval based on topological and geometric signatures.. IEEE transactions on visualization and computer graphics. 13 (3), pp. 470–482. Cited by: §2.2.
  94. Segcloud: semantic segmentation of 3d point clouds. In 2017 international conference on 3D vision (3DV), pp. 537–547. Cited by: §2.1.
  95. Kpconv: flexible and deformable convolution for point clouds. In Proceedings of the IEEE International Conference on Computer Vision, pp. 6411–6420. Cited by: §1, §2.2, Table 3.
  96. Instance normalization: the missing ingredient for fast stylization. arXiv preprint arXiv:1607.08022. Cited by: §4.2.
  97. Graph attention networks. arXiv preprint arXiv:1710.10903. Cited by: §2.2.
  98. Feastnet: feature-steered graph convolutions for 3d shape analysis. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2598–2606. Cited by: §1, §2.1.
  99. Articulated mesh animation from multi-view silhouettes. In ACM SIGGRAPH 2008 papers, pp. 1–9. Cited by: §5.2.
  100. NormalNet: a voxel-based cnn for 3d object classification and retrieval. Neurocomputing 323, pp. 139–147. Cited by: §2.1.
  101. Dominant set clustering and pooling for multi-view 3d object recognition. arXiv preprint arXiv:1906.01592. Cited by: §2.1.
  102. Graph attention convolution for point cloud semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 10296–10305. Cited by: §2.2.
  103. Dynamic graph cnn for learning on point clouds. ACM Transactions on Graphics (TOG) 38 (5), pp. 1–12. Cited by: §2.1, Table 5.
  104. Active co-analysis of a set of shapes. ACM Transactions on Graphics (TOG) 31 (6), pp. 1–10. Cited by: Figure 6, §5.2, Table 6.
  105. Deep geometric prior for surface reconstruction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 10130–10139. Cited by: §2.1.
  106. 3d shapenets: a deep representation for volumetric shapes. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1912–1920. Cited by: §2.1, §5.1, Table 3, §7.
  107. Geometry sharing network for 3d point cloud classification and segmentation. arXiv preprint arXiv:1912.10644. Cited by: §2.1.
  108. Spidercnn: deep learning on point sets with parameterized convolutional filters. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 87–102. Cited by: §1.
  109. Continuous geodesic convolutions for learning on 3d shapes. arXiv preprint arXiv:2002.02506. Cited by: §1, §2.3, Table 5.
  110. SPNet: deep 3d object classification and retrieval using stereographic projection. CoRR abs/1811.01571. External Links: Link, 1811.01571 Cited by: §1.
  111. Deep learning for 3d shape classification from multiple depth maps. In 2017 IEEE International Conference on Image Processing (ICIP), pp. 3615–3619. Cited by: §2.1.
  112. Mesh segmentation via recursive and visually salient spectral cuts. In Proc. of vision, modeling, and visualization, pp. 429–436. Cited by: §2.3.
  113. Toward real-time 3d object recognition: a lightweight volumetric cnn framework using multitask learning. Computers & Graphics 71, pp. 199–207. Cited by: §2.1.
  114. Decomposing polygon meshes by means of critical points. In 10th International Multimedia Modelling Conference, 2004. Proceedings., pp. 187–195. Cited by: §2.3.
  115. Random walk network for 3d point cloud classification and segmentation. In 2019 IEEE International Conference on Robotics and Biomimetics (ROBIO), pp. 1921–1926. Cited by: §2.1.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
414433
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description