GAPNet: Graph Attention based Point Neural Network for Exploiting Local Feature of Point Cloud
Abstract
Exploiting finegrained semantic features on point cloud is still challenging due to its irregular and sparse structure in a nonEuclidean space. Among existing studies, PointNet provides an efficient and promising approach to learn shape features directly on unordered 3D point cloud and has achieved competitive performance. However, local feature that is helpful towards better contextual learning is not considered. Meanwhile, attention mechanism shows efficiency in capturing node representation on graphbased data by attending over neighboring nodes. In this paper, we propose a novel neural network for point cloud, dubbed GAPNet, to learn local geometric representations by embedding graph attention mechanism within stacked MultiLayerPerceptron (MLP) layers. Firstly, we introduce a GAPLayer to learn attention features for each point by highlighting different attention weights on neighborhood. Secondly, in order to exploit sufficient features, a multihead mechanism is employed to allow GAPLayer to aggregate different features from independent heads. Thirdly, we propose an attention pooling layer over neighbors to capture local signature aimed at enhancing network robustness. Finally, GAPNet applies stacked MLP layers to attention features and local signature to fully extract local geometric structures. The proposed GAPNet architecture is tested on the ModelNet40 and ShapeNet part datasets, and achieves stateoftheart performance in both shape classification and part segmentation tasks.
GAPNet: Graph Attention based Point Neural Network for Exploiting Local Feature of Point Cloud
Can Chen School of SATM Cranfield University UK, MK43 0AL can.chen@cranfield.ac.uk Luca Zanotti Fragonara School of SATM Cranfield University UK, MK43 0AL l.zanottifragonara@cranfield.ac.uk Antonios Tsourdos School of SATM Cranfield University UK, MK43 0AL a.tsourdos@cranfield.ac.uk
noticebox[b]Preprint. Under review.\end@float
1 Introduction
As point cloud data becomes increasingly popular in a wide range of applications such as: autonomous vehicle zhou2018voxelnet (); qi2018frustum (); ku2018joint (); liu2018real (), robotic mapping and navigation biswas2012depth (); zhu2017target (), 3D shape representation and modelling golovinskiy2009shape (), many researchers are drawing attention to shape analysis and understanding, especially when convolutional neural networks (CNNs) achieves significant success in computer vision tasks. However, CNNs heavily rely on the data with the standard grid structure, which leads to inefficient performance on irregular and unordered geometric data, such as point cloud. As a result, fully exploiting contextual information from point cloud remains a challenging problem.
In order to leverage advantages of CNNs, some approaches maturana2015voxnet (); wang2015voting (); riegler2017octnet () map unstructured point cloud to a standard 3D grid before applying CNN architectures. However, these volumetric representations are not efficient in terms of memory and computational efficiency due to the typical sparsity of point cloud structure. Instead of applying CNNs over gridded point cloud, PointNet qi2017pointnet () pioneers the approach that applies deep learning directly over irregular point cloud. In particular, PointNet makes input point cloud invariant to permutations and exploits pointwise features by independently applying a MultiLayerPerceptron (MLP) network and a symmetric function on each point. However, it only captures global feature without local information. PointNet++ qi2017pointnet++ () extends PointNet model by constructing a hierarchical neural network that recursively applies PointNet with designed sampling and grouping layers to extract local features. DGCNN wang2018dynamic () operates an edge convolution on points and corresponding edges to further exploit local information. Adapted from point cloud registration method, KCNet shen2018mining () builds a kernel correlation layer to measure geometric affinities for points.
Attention mechanisms have proved to be efficient in many areas, such as machine translation task vaswani2017attention (); bahdanau2014neural (), visionbased task mnih2014recurrent (), and graphbased task velivckovic2017graph (). Inspired by graph attention networks velivckovic2017graph (), we primarily focus on fully exploiting finegrained local features for point cloud in an attention manner in 3D shape classification and part segmentation tasks. The key contributions of our work are summarized as follows:

We propose a multihead GAPLayer to capture contextual attention features by indicating different importance of neighbors for each point. Independent heads attend to different features from representation space in parallel and are further aggregated together to obtain sufficient power of feature extraction.

We propose selfattention and neighboringattention mechanisms to allow the GAPLayer to compute the attention coefficients by considering the selfgeometric information and local correlations to corresponding neighbors.

An attention pooling layer over neighbors is proposed to identify the most important features to obtain local signature representation to enhance network robustness.

Our GAPNet integrates the GAPLayer and the attention pooling layer into stacked MultiLayerPerceptron (MLP) layers or existing pipelines (e.g. PointNet) to better extract local contextual feature from unordered point cloud.
2 Related work
Learning features from volumetric grid.
Voxelization is an intuitive way to convert sparse and irregular point cloud to standard grid structure, after which standard CNNs can be applied for feature extraction. Voxnet maturana2015voxnet () voxelizes the point cloud into a volumetric grid that indicates spatial occupancy for each voxel, followed by a 3D CNN over occupied voxels to predict categories of objects. However, 3D dense and sparselyoccupied volumetric grid leads to large memory and computational cost for high spatial resolution. As a result, some improvements are proposed to address the sparsity problem. KdNet klokov2017escape () uses a kdtree bentley1975multidimensional () to build an efficient 3D space partition structure and a deep architecture to learn representations of point cloud. Similarly, OctNet riegler2017octnet () applies 3D convolution on a hybrid gridoctree structure generated from a set of shallow octrees to achieve high resolution.
Learning features from unstructured point cloud directly.
PointNet qi2017pointnet () is the pioneer work that proposed the direct application of deep learning on the raw point cloud. In more detail, a MultiLayerPerceptron (MLP) network and a symmetric function (e.g. max pooling) are applied on every individual point to extract global feature. This approach provides an efficient way for unstructured point cloud understanding, however, local feature is not captured as the architecture only works on independent points without relationships measurement between points in the local regions. To address this problem, PointNet++ qi2017pointnet++ () constructs a hierarchical neural network that recursively applies PointNet with a sampling layer and a grouping layer to exploit local representations. DGCNN wang2018dynamic () extends PointNet by presenting an edge convolution operation (EdgeConv) that is applied on edge features which aggregate each point and corresponding edges connecting to the neighboring pairs. In order to leverage the advantages of standard CNN operation, PointCNN li2018pointcnn () attempts to learn a convolutional operator to transform a given unordered point set to a latent canonical order, after which a typical CNN architecture is used to extract local features.
Learning features from multiview models.
In order to apply standard CNN operation but also avoid large computation cost in volumetricbased methods, some researchers are interested in multiview based approaches. For instance, qi2016volumetric (); wang2017dominant () learns features of point cloud in an indirect way by applying a typical 2D CNN architecture to multiple 2D image views that are generated by the multiview projections over 3D point cloud. However, these multiview approaches are not capable to realize semantic segmentation task for point cloud, as 2D images lack of depth information, which leads to the fact that it is nontrivial to classify each point from images.
Learning features from geometric deep learning.
Geometric deep learning bronstein2017geometric () is a modern term for a set of emerging techniques that attempts to address nonEuclidean structured data (e.g. 3D point cloud, social networks or genetic networks) by deep neural networks. Graph CNNs bruna2013spectral (); defferrard2016convolutional (); zhang2018graph () show advantages of graph representation in many tasks for nonEuclidean data, as it can naturally deal with these irregular structures. PointGCN ZhangR_18_gcnn_point_cloud () builds a graph CNN architecture to capture local structure and classify point cloud, which also proves that geometric deep learning has huge potential for unordered point cloud analysis.
3 GAPNet architecture
In this section, we propose our GAPNet model to better learn local representations for unstructured point cloud in shape classification and part segmentation tasks. We detail the model that consists of three components: GAPLayer (multihead graph attention based point network layer) that is shown in Figure 2 , attention pooling layer, and GAPNet architecture shown in Figure 3 .
Let be a raw set of unordered points and input of our model, with dimension, where is the number of the points, and is a feature vector that might contain 3D space coordinates , color, intensity, surface normal, etc. For the sake of simplicity, in this study we set and only use 3D coordinates as input features.
3.1 GAPLayer
Local structure representation.
Considering the fact that the number of samples in point cloud can be very large in real applications (e.g. autonomous vehicle), allowing every point to attend to all other points will lead to high computation cost and gradient vanishing problem due to very small weights allocation on every other point for every point. As a result, we construct a directed nearest neighbor graph to represent local structure of the point cloud, where are nodes for points, are edges connecting neighboring pairs of points, and is a neighborhood set of point . We define edge features as , where , and indicates the neighboring point to point .
Singlehead GAPLayer.
To the benefit of the readers, we start by introducing a singlehead GAPLayer that takes point cloud data as the input, jointly with a multihead mechanism that concatenates all heads together over feature channels in our network. The structure of singlehead GAPLayer is shown in Figure 2(b) .
In order to pay different attentions to different neighbors, we propose a selfattention mechanism and a neighboringattention mechanism to capture attention coefficients for each point to its neighborhood as illustrated in Figure 1 . In more detail, the selfattention mechanism learns selfcoefficients by considering selfgeometric information for each individual point, while neighboringattention mechanism focuses on localcoefficients by considering neighborhood.
As an initial step, we encode nodes and edges of point cloud with respect to to the higherlevel features with output dimension as defined by Equation 1 and 2 .
(1) 
(2) 
where h() is a parametric nonlinear function, chosen to be a singlelayer neural network in our experiment, and is a set of learnable parameters of the filter.
We obtain attention coefficients by fusing selfcoefficients and localcoefficients as defined by Equation 3 , where and are singlelayer neural network with 1dimension output. denotes nonlinear activation function leaky RELU.
(3) 
In order to align comparison of the attention coefficients across neighbors for different points, we use softmax function to normalize coefficients for all the neighbors to every point that is referred as 4 .
(4) 
The goal of each singlehead GAPLayer is to compute contextual attention feature for every point. For this, we utilize the obtained normalized coefficients to compute a linear combination that is shown in Equation 5 . As shown in Figure 2(b) , the outputs of singlehead GAPLayer are attention feature and graph feature encoded from graph edges.
(5) 
Where f() is a nonlinear activation function, chosen to be RELU in our experiment.
Multihead mechanism.
In order to obtain sufficient structural information and stabilize the network, we concatenate independent singlehead GAPLayers to generate a multiattention features with channels. The equation is defined as 6 . As shown in Figure 2(a) , the outputs of multihead GAPLayer (GAPLayer for short) are multiattention features and multigraph features that concatenate attention feature and graph feature respectively from corresponding head.
(6) 
Such that is the attention feature of the th head, is the total number of heads, and is concatenation operation over feature channels.
3.2 Attention pooling layer
To enhance network robustness and improve performance, we define an attention pooling layer on neighboring channel of multigraph features. We use max pooling as our attention pooling operation which identifies the most important feature across heads to capture local signature representation defined as 7 . The local signature is connected to the intermediate layer for capturing global feature.
(7) 
3.3 GAPNet architecture
Our GAPNet model shown in Figure 3 considers both shape classification and semantic part segmentation for point cloud. The architecture is similar to PointNet qi2017pointnet (). However, there are three main differences between the architectures. Firstly, we use an attentionaware spatial transform network to make the point cloud invariant to certain transformations. Secondly, instead of only processing individual points, we exploit local features by a GAPLayer before the stacked MLP layers. Thirdly, an attention pooling layer is used to obtain local signature that is connected to the intermediate layer for capturing a global descriptor.
4 Experiments
In this section, we evaluate our GAPNet model in the classification and part segmentation tasks for 3D point cloud analysis, we then compare our performance to recent stateoftheart methods and perform ablation study to investigate different design variations.
4.1 Classification
Dataset.
We demonstrate the effectiveness of our classification model on the ModelNet40 benchmark wu20153d () for shape classification. The ModelNet40 dataset contains 12,311 meshed CAD models that are classified to 40 manmade categories. We separate 9,843 models for training and 2,468 models for testing. Then we normalize the models in the unit sphere and uniformly sample 1,024 points over model surface. Besides, We further augment the training dataset by randomly rotating, scaling the point cloud and jittering the location of every point by means of Gaussian noise with zero mean and 0.01 standard deviation for all the models.
Network structure.
The classification model is presented in Figure 3 (top branch). In order to make the input points invariant to some geometric transformations, such as scale, rotation, we firstly apply an attentionaware spatial transformer network to align the point cloud to a canonical space. The network employs a singlehead GAPLayer with 16 channels to capture attention features, followed by three shared MLP layers (64, 128, 1024) to output neurons with sizes 64, 128, 1024 respectively, then a max pooling operation and two fullconnected layers (512, 256) are used to finally generate a transformation matrix.
A multihead GAPLayer is then applied to generate multiattention features with channels, where the number of heads is set as , and the number of encoding channels is set as . Our multiattention features aggregate coordinate feature of point cloud to obtain a contextual attention features with the number of channels , which is then used to extract finegrained features by four shared MLP layers (64, 64, 64, 128). The skipconnection method is employed to connect local signature and these intermediate layers, followed by a shared fullconnected layer (1024) and a max pooling operation over feature channels to obtain a global feature for the entire point cloud. We finally apply three shared MLP layers (512, 256, 40) and dropout operation with a keep probability of 0.5 to transform global feature to 40 categories. Besides, the activation function ReLU with batch normalization is used in each layer, and the number of neighbors is set to 20.
Training details.
During the training, our optimizer model is Adam kingma2014adam () with momentum 0.9, and we set batch size 32 and learning rate starts from 0.005 and then is divided by 2 every 20 epochs to 0.00001. The decay rate for batch normalization is initially set to 0.7 and increases to 0.99 gradually. Our model is trained on a NVIDIA GTX1080Ti GPU and TensorFlow v1.6.
Results.
Table 1 compares our results and complexity with several recent stateoftheart works, and our model achieves the best performance on the ModelNet40 benchmark, and it outperforms the previous stateoftheart model DGCNN by 0.2% accuracy.
To compare the complexity, we measured the model complexity and the computational complexity using the model size and forward time respectively. We also evaluated and listed in Table 1 the same metrics for all the available models in the same experimental environment. Although PointNet achieves the best computational complexity, our model outperforms it by 3.1% accuracy, which leads to the fact that our model achieves the best tradeoff between accuracy and complexity.






VOXNET maturana2015voxnet ()  83.0  85.9      
POINTNET qi2017pointnet ()  86.0  89.2  41.8  14.7  
POINTNET++ qi2017pointnet++ ()    90.7  19.9  32.0  
KDNET klokov2017escape ()    91.8      
KCNET shen2018mining ()    91.0      
DGCNN wang2018dynamic ()  90.2  92.2  22.1  52.0  
OURS  89.7  92.4  22.9  27.9 
Ablation study.
We also test our classification model with different settings on the ModelNet40 benchmark wu20153d () . In particular, we analyze the effectiveness of the GAPLayer, attention pooling layer, and also different numbers of multiple heads and encoding channels.
Table 8 represents the advantages of our GAPLayer and attention pooling layer. It shows that attention pooling layer leads to 0.6% accuracy. ConstantGAPLayer indicates a model with the same structure as our GAPLayer, but all the coefficients are set to equal constants, and it indicates the effectiveness of graph attention mechanism and our GAPLayer model that leads to 0.7% accuracy.
For what concerns the impact of different numbers of heads and encoding channels . Table 8 indicates that appropriate numbers are beneficial to local feature extraction, however the performance degenerates when the numbers become further larger.
avg  air.  bag  cap  car  cha.  ear.  gui.  kni.  lam.  lap.  mot.  mug  pis.  roc.  ska.  tab.  


2690  76  55  898  3758  69  787  392  1547  451  202  184  283  66  152  5271  
pointnet  83.7  83.4  78.7  82.5  74.9  89.6  73.0  91.5  85.9  80.8  95.3  65.2  93.0  81.2  57.9  72.8  80.6  
pointnet++  85.1  82.4  79.0  87.7  77.3  90.8  71.8  91.0  85.9  83.7  95.3  71.6  94.1  81.3  58.7  76.4  82.6  
kdnet  82.3  82.3  74.6  74.3  70.3  88.6  73.5  90.2  87.2  81.0  94.9  57.4  86.7  78.1  51.8  69.9  80.3  
dgcnn  85.1  84.2  83.7  84.4  77.1  90.9  78.5  91.5  87.3  82.9  96.0  67.8  93.3  82.6  59.7  75.5  82.0  
ours  84.7  84.2  84.1  88.8  78.1  90.7  70.1  91.0  87.3  83.1  96.2  65.9  95.0  81.7  60.7  74.9  80.8 
4.2 Semantic part segmentation
Dataset.
We evaluate our segmentation model on ShapeNet part dataset yi2016scalable () in semantic part segmentation task that is to classify part category for each point from a mesh model. The dataset consists of 16,881 CAD shapes of 16 categories, and each point from a model is annotated with a class of 50 part classes. Besides, each shape model is labeled with several but less than 6 parts. We follow the same sampling strategy as Section 4.1 to sample 2,048 points uniformly, and split dataset into 9,843 models for training and 2,468 models for testing in our experiment.
Model structure.
Our segmentation model shown in Figure 3 (bottom branch) is to predict a part category label for each point in the point cloud. We firstly use the same spatial transformer network and the GAPLayer as Section 4.1, followed by shared MLP layers (64, 64, 128). Then the second GAPLayer with 4 heads and 128 encoding channels is applied, followed by shared MLP layers (128, 128, 512) to obtain representations with 512 channels, which are concatenated with local signature generated from corresponding attention pooling layer of GAPLayer. The aggregated feature applies a shared fullconnected layer (1024) and a max pooling to obtain a global feature, which is then duplicated 2048 times and finally applies four shared fullconnected layers (256,256,128,50) with dropout probability 0.6 to transform the global feature to 50 part categories.
Training details.
The training setting is similar to the setting in classification task, except that batch size is set to 8, number of neighbors is set to30, and we distribute the task to two NVIDIA TESLA V100 GPUs.
Results.
We use the mean Intersection over Union (mIoU) qi2017pointnet () as our evaluation scheme to align the evaluation metric. The IoU of each shape is calculated by averaging IoUs for all parts that fall into the same category, then the mIoU is the mean IoUs for all shapes from testing dataset.
Table 2 shows that our model achieves competitive results on the ShapeNet part dataset yi2016scalable (). Our model wins 8 categories for part segmentation compared with 6 winning categories from DGCNN wang2018dynamic (), although it outperforms ours by 0.4% accuracy. Figure 4(a) represents some shapes from our results, we also visualize the difference between ground truth and our prediction results as shown in Figure 4(b) , where left shapes indicate ground truth and right shapes show our prediction results.
5 Conclusions
In this paper, we propose a graph attention based point neural network, named GAPNet, to learn shape representations for point cloud. Experiments show stateoftheart performance in shape classification and semantic part segmentation tasks. The success of our model also verifies the fact that graph attention network shows efficiency in not only similarity computation for graph nodes, but also geometric relationship understanding.
In the future, we can further explore several research avenues. For example, some applications, such as autonomous vehicle, normally need to process very largescale point cloud data. As a result, how to efficiently and robustly deal with largescale data would be a worthwhile work. Furthermore, it would be interesting to develop an efficient CNNlike operation for unstructured data analysis.
Acknowledgments
The HumanDrive project is a CCAV  Innovate UK funded R&D project (Project ref: 103283).
References
 (1) Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014.
 (2) Jon Louis Bentley. Multidimensional binary search trees used for associative searching. Communications of the ACM, 18(9):509–517, 1975.
 (3) Joydeep Biswas and Manuela Veloso. Depth camera based indoor mobile robot localization and navigation. In Robotics and Automation (ICRA), 2012 IEEE International Conference on, pages 1697–1702. IEEE, 2012.
 (4) Michael M Bronstein, Joan Bruna, Yann LeCun, Arthur Szlam, and Pierre Vandergheynst. Geometric deep learning: going beyond euclidean data. IEEE Signal Processing Magazine, 34(4):18–42, 2017.
 (5) Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun. Spectral networks and locally connected networks on graphs. arXiv preprint arXiv:1312.6203, 2013.
 (6) Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in neural information processing systems, pages 3844–3852, 2016.
 (7) Aleksey Golovinskiy, Vladimir G Kim, and Thomas Funkhouser. Shapebased recognition of 3d point clouds in urban environments. In Computer Vision, 2009 IEEE 12th International Conference on, pages 2154–2161. IEEE, 2009.
 (8) Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
 (9) Roman Klokov and Victor Lempitsky. Escape from cells: Deep kdnetworks for the recognition of 3d point cloud models. In Proceedings of the IEEE International Conference on Computer Vision, pages 863–872, 2017.
 (10) Jason Ku, Melissa Mozifian, Jungwook Lee, Ali Harakeh, and Steven L Waslander. Joint 3d proposal generation and object detection from view aggregation. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1–8. IEEE, 2018.
 (11) Yangyan Li, Rui Bu, Mingchao Sun, Wei Wu, Xinhan Di, and Baoquan Chen. Pointcnn: Convolution on xtransformed points. In Advances in Neural Information Processing Systems, pages 828–838, 2018.
 (12) Zhongze Liu, Huiyan Chen, Huijun Di, Yi Tao, Jianwei Gong, Guangming Xiong, and Jianyong Qi. Realtime 6d lidar slam in large scale natural terrains for ugv. In 2018 IEEE Intelligent Vehicles Symposium (IV), pages 662–667. IEEE, 2018.
 (13) Daniel Maturana and Sebastian Scherer. Voxnet: A 3d convolutional neural network for realtime object recognition. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 922–928. IEEE, 2015.
 (14) Volodymyr Mnih, Nicolas Heess, Alex Graves, et al. Recurrent models of visual attention. In Advances in neural information processing systems, pages 2204–2212, 2014.
 (15) Charles R Qi, Wei Liu, Chenxia Wu, Hao Su, and Leonidas J Guibas. Frustum pointnets for 3d object detection from rgbd data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 918–927, 2018.
 (16) Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. Pointnet: Deep learning on point sets for 3d classification and segmentation. Proc. Computer Vision and Pattern Recognition (CVPR), IEEE, 1(2):4, 2017.
 (17) Charles R Qi, Hao Su, Matthias Nießner, Angela Dai, Mengyuan Yan, and Leonidas J Guibas. Volumetric and multiview cnns for object classification on 3d data. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5648–5656, 2016.
 (18) Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J Guibas. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In Advances in Neural Information Processing Systems, pages 5099–5108, 2017.
 (19) Gernot Riegler, Ali Osman Ulusoy, and Andreas Geiger. Octnet: Learning deep 3d representations at high resolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3577–3586, 2017.
 (20) Yiru Shen, Chen Feng, Yaoqing Yang, and Dong Tian. Mining point cloud local structures by kernel correlation and graph pooling. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4548–4557, 2018.
 (21) Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in neural information processing systems, pages 5998–6008, 2017.
 (22) Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. Graph attention networks. arXiv preprint arXiv:1710.10903, 2017.
 (23) Chu Wang, Marcello Pelillo, and Kaleem Siddiqi. Dominant set clustering and pooling for multiview 3d object recognition. In Proceedings of British Machine Vision Conference (BMVC), volume 12, 2017.
 (24) Dominic Zeng Wang and Ingmar Posner. Voting for voting in online point cloud object detection. In Robotics: Science and Systems, volume 1, pages 10–15607, 2015.
 (25) Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay E Sarma, Michael M Bronstein, and Justin M Solomon. Dynamic graph cnn for learning on point clouds. arXiv preprint arXiv:1801.07829, 2018.
 (26) Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Linguang Zhang, Xiaoou Tang, and Jianxiong Xiao. 3d shapenets: A deep representation for volumetric shapes. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1912–1920, 2015.
 (27) Li Yi, Vladimir G Kim, Duygu Ceylan, I Shen, Mengyan Yan, Hao Su, Cewu Lu, Qixing Huang, Alla Sheffer, Leonidas Guibas, et al. A scalable active framework for region annotation in 3d shape collections. ACM Transactions on Graphics (TOG), 35(6):210, 2016.
 (28) Yingxue Zhang and Michael Rabbat. A graphcnn for 3d point cloud classification. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 6279–6283. IEEE, 2018.
 (29) Yingxue Zhang and Michael Rabbat. A graphcnn for 3d point cloud classification. In International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, Canada, 2018.
 (30) Yin Zhou and Oncel Tuzel. Voxelnet: Endtoend learning for point cloud based 3d object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4490–4499, 2018.
 (31) Yuke Zhu, Roozbeh Mottaghi, Eric Kolve, Joseph J Lim, Abhinav Gupta, Li FeiFei, and Ali Farhadi. Targetdriven visual navigation in indoor scenes using deep reinforcement learning. In Robotics and Automation (ICRA), 2017 IEEE International Conference on, pages 3357–3364. IEEE, 2017.