FPConv: Learning Local Flattening for Point Convolution
We introduce FPConv, a novel surface-style convolution operator designed for 3D point cloud analysis. Unlike previous methods, FPConv doesn’t require transforming to intermediate representation like 3D grid or graph and directly works on surface geometry of point cloud. To be more specific, for each point, FPConv performs a local flattening by automatically learning a weight map to softly project surrounding points onto a 2D grid. Regular 2D convolution can thus be applied for efficient feature learning. FPConv can be easily integrated into various network architectures for tasks like 3D object classification and 3D scene segmentation, and achieve comparable performance with existing volumetric-type convolutions. More importantly, our experiments also show that FPConv can be a complementary of volumetric convolutions and jointly training them can further boost overall performance into state-of-the-art results. Code is available at https://github.com/lyqun/FPConv
With the rapid development of 3D scan devices, it is more and more easy to generate and access 3D data in the form of point clouds. This also brings the challenging of robust and efficient 3D point clouds analysis, which serves as an important component in many real world applications like robotics navigation, autonomous driving, augmented reality applications and so on [35, 52, 3, 38].
Despite decades of development in 3D analysing technologies, it is still quite challenging to perform point cloud based semantic analysis, largely due to its sparse and unordered structure. Early methods [7, 8, 11, 28] utilized hand-crafted features with complex rules to tackle this problem. Such empirical human-designed features would suffer from limited performance in general scenes. Recently, with the explosive growth of machine learning and deep learning techniques, Deep Neural Network (CNN) based methods have been introduced into this task [36, 37] and reveal promising improvements. However, both PointNet  and PointNet++  doesn’t support convolution operation which is a key contributing factor in Convolutional Neural Network (CNN) for efficient local processing and handling large-scale data.
A straightforward extension of 2D CNN is treating 3D space as a volumetric grid and using 3D convolution for analysis [49, 39]. Although these approaches have achieved success in tasks like object classification and indoor semantic segmentation [30, 9], they still have limitations like cubic growth rate of memory requirement and high computational cost, leading to insufficient analysis and low predication accuracy on large-scale scenes. Recently, [48, 44] are proposed to approximate such volumetric convolutions with point-based convolution operations, which greatly improves the efficiency and preserves the output accuracy. However, these methods are still difficult to capture fine details on surface with relatively flat and thin structures.
In reality, data captured by 3D sensors and LiDAR are usually sparse that points fall near scene surfaces and almost no points interior. Hence, surfaces are more natural and compact for 3D data representation. Towards this end, works like [10, 51] establish connections among points and apply graph convolutions in the corresponding spectral domain or focus on the surface represented by the graph , which are usually impractical to create and sensitive to local topological structures.
More recently, [43, 33, 18] are proposed to learn convolution on a specified 2D plane. Inspired by these pioneering works, we develop FPConv, a new convolution operation for point clouds. It works directly on local surface of geometry without any intermediate grid or graph representation. Similar to , it works in projection-interpolation manner but more general and implicit. Our key observation is that projection and interpolation can be simplified into a single weight map learning process. Instead of explicitly projecting onto the tangent plane  for convolution, FPConv learns how to diffuse convolution weights of each point along the local surface, which is more robust to various input data and greatly improves the performance of surface-style convolution.
As a local feature learning module, FPConv can be further integrated with other operations in classical nerual network architectures and works on various analysis tasks. We demonstrate FPConv on 3D object classification as well as 3D scene semantic segmentation. Networks with FPConv outperform previous surface-style approaches  and achives comparable results with current start-of-the-art methods. Moreover, our experiments also shows that FPConv performs better at regions that are relatively flat thus can be a complementary to volumetric-type works and joint training helps to boost the overall performance into state-of-the-art results.
To summarize, the main contributions of this work are as follows:
FPConv, a novel surface-style convolution for efficient 3D point cloud analysis.
Significant improvements over previous surface-style convolution based methods and comparable performance with state-of-the-art volumetric-style methods in classification and segmentation tasks.
An in-depth analysis and comparison between surface-style and volumetric-style convolution, demonstrating that they are complementary to each other and joint training boosts the performance into state-of-the-art.
2 Related Work
Deep learning based 3D data analysis has been a quite hot research topic in recent years. In this section, we mainly focus on point cloud analysis and briefly review previous works according to their underling methodologies.
Volumetric-style point convolution Since a point cloud disorderly distributes in a 3D space without any regular structures, pioneer works sample points into grids for conventional 3D convolutions apply, but limited by high computational load and low representation efficiency [30, 49, 39, 41]. PointNet  proposes a shared MLP on every point individually followed by a global max-pooling to extract global feature of the input point cloud.  extends it with nested partitionings of point set to hierarchically learn more local features, and many works follow that to approximate point convolutions by MLPs [24, 25, 16, 46]. However, adopting such a representation can not capture the local features very well. Recent works define explicit convolution kernels for points, whose weights are directly learned like image convolutions [17, 50, 12, 2, 44]. Among them, KPConv  proposes a spatially deformable point convolution with any number of kernel points which alleviates both varying densities and computational cost, outperform all associated methods on point analysis tasks. However, there volumetric-style approaches may not capture uniform areas very well.
Graph-style point convolution When the relationships among points have been established, a Graph-style convolution can be applied to explore and study point cloud more efficiently than volumetric-style. Convolution on a graph can be defined as convolution in its spectral domain. [6, 15, 10]. ChebNet  adopts Chebyshev polynomial basis for representing the spectral filters to alleviate the cost of explicitly computing the graph Fourier transform. Furthermore,  uses a localized first-order approximation of spectral graph convolutions for semi-supervised learning on graph-structured data which greatly accelerates calculation efficiency and improves classification results. However, these methods are all depending on a specific graph structure. Then  introduces a spectral parameterization of dilated convolution kernels and a spectral transformer network, sharing information across related but different shape structures. In the meantime, [29, 5, 40, 32] focus on graph learning on manifold surface representation to avoid the spectral domain operation, while [45, 47] learn filters on edge relationships instead of points relative positions. Although a graph convolution combines features on local surface patches and can be invariant to the deformations in Euclidean space. However, reasonable relationships among distinct points are not easy to establish.
Surface-style point convolution Since data captured by 3D sensors typically represent surfaces, another mainstream approach attempts to operate directly on surface geometry. Most works project a shape surface consist of points to an intermediate grid structure, e.g. multi-view RGB-D images, following with conventional convolutions [13, 26, 31, 4, 23]. Such methods often suffer from the redundant representation of multi-view and the amubiguity casued of by different viewpoints.  proposes projecting local neighborhoods of each point to its local tangent plane and processing them with 2D convolutions, which is efficient for analyzing dense point clouds of large-scale and outdoor environments. However, this method relies heavily on point tangent estimation, and this linear projection is not always optimal for complex areas.  optimizes the calculation with parallel tangential frames, while  utilizes a 4-rotational symmetric field to define a domain for convolution on surface, which not only increase the robustness, but also make the utmost of detailed information. However, existing surface-style learning algorithms cannot perform very well on challenge datasets such as S3DIS  and ScanNet , since they lose 1-dimensional information and they cannot estimate the surface accurately.
Our method is inspired by surface-style point convolutions. The network learns a non-linear projection for each local patch, say flattening the local neighborhood points into a 2D grid plane. Then 2D convolutions can be applied for feature extraction. Although learning on surface will lose 1-dimensional information, FPConv still achieves comparable performance with existing volumetric-style convolutions. In addition, our FPConv can be integrated into volumetric-style convolution and achieve state-of-the-art results.
In this section, we formally introduce FPConv. We first revisit the definition of convolution along point cloud surface and then show it can be simplified into a weight learning problem under discrete setting. All derivations are provided in the form of point clouds.
3.1 Learn Local Flattening
Let be a point from a point cloud and be a scalar function defined over points. Here can encode signals like color, geometry or features from intermediate network layers. We denote as a local point cloud patch centered at where with being the chosen radius.
Convolution on local surface:
In order to convolve around the surface, we first extend it to a continuous function over a continuous surface. We introduce a virtual 2D plane with a continuous signal together with a map which maps onto and
The convolution at is defined as:
where is a convolution kernel. We now describe how to formulate the above convolution into a weight learning problem.
Local flattening by learning projection weights:
Where . Furthermore, we can rewrite Eq.2 in an approximate discretized form as:
Where is discretized convolution kernel weights, and in . Let , , , and . Now we can see that projection and interpolation can be combined into a single weight matrix where it only depends on the point location w.r.t the center point.
According to Eq.5, we can design a module to learn projection weights directly instead of learning projection and interpolation separately, as shown in Fig.2. We also want this module to have two properties: first, it should be invariant to input permutation since the local point cloud is unordered; second, it should be adaptive to input geometry, hence the projection should combine local coordinates and global information of local patch. Therefore, we first use pointnet  to extract the global feature of local region, namely distribution feature, which is invariant to permutation. Then we concatenate the distribution feature to each of the input points, as shown in Fig.3. After that, a shared MLPs is employed to predict the final projection weights.
After projection, 2D convolution is applied on obtained grid feature plane. To extract a local feature vector, global convolution or pooling can be applied on the last layer of 2D convolution network.
However, feature intensity of pixels in grid plane may be unbalanced when the summation of feature intensities received from points in local region is varying, which can break the stability of a neural network and make the training hard to converge. In order to balance the feature intensity of grid plane, we further introduce two normalization methods on learned projection weights.
Dense Grid Plane: Let projection weights matrix be . One possible way to obtain a dense grid plane is normalizing at the first dimension by dividing their summation to make sure the summation of intensities received at each pixel is equal to 1. This is similar to bilinear interpolation method. In our implementation, we use softmax to avoid being divided by zero, which is shown in Eq.6.
Sparse Grid Plane: Due to natural sparsity of point cloud, normalize the projection weights to get a dense grid plane may not be optimal. In this case, we design a 2-step normalization which can preserve the sparsity of projection weights matrix, and then the grid plane. Moreover, we conduct ablation studies on our proposed two normalization techniques.
First step is to normalize at second dimension to balance the intensity given out by local neighbor points. Here, we add a positive to avoid being divided by zero. As shown in Eq.7, indicates -th row of .
Second step is to normalize at first dimension to balance the intensity received at each pixel position. It can be implemented similar to first step by dividing by summation of each column. However, we choose another method shown in Eq.8 to maintain a continuous sparsity, where indicates -th column of . Examples of continuous sparsity and binary sparsity are shown in Fig.4.
4.1 Residual FPConv Block
To build a deep network for segmentation and classification, we develop a bottleneck-design residual FPConv block inspired by , as shown in Fig.6. This block takes a point cloud as input, applying a stack of shared MLP, FPConv, and shared MLP, where shared MLPs are responsible for reducing and then increasing (or restoring) dimensions, similar to convolutions in residual convolution block .
4.2 Multi-Scale Analysis
Farthest Point Sampling: we use iterative farthest point sampling to downsample the point cloud. As mentioned in PointNet++ , FPS has better coverage of the entire point set given the same number of centroids compared with random sampling.
Pooling: we use max-pooling to group local features. Given an input point cloud and a downsampled point cloud with their corresponding features and , we group neighbors for each point in with radius of and apply pooling operator on features of grouped points, as shown in Eq.9, where for any .
FPConv with FPS: similar to pooling operation, this block applies FPConv on each point of downsampled point cloud and search neighbors over full point cloud, as shown in Eq.10.
Upsampling: we use nearest neighbors interpolation to upsample point cloud by euclidean distance. Given a point cloud with features and a target point cloud , we compute feature for each point in by interpolating its neighbor points searched over .
In the upsampling phase, skip connection and a shared MLPs is used for fusing features from encoder and decoder. nearest neighbors upsampling and shared MLPs can be replaced by de-convolution, but it does not lead to a significant improvement as mentioned in , so we do not employ it in our experiments.
Architecture shown in Fig.5 is designed for large scene segmentation, including four layers of downsampling and upsampling for multi-scale analysis. For classification task, we apply a global pooling on the last layer of downsampling to obtain global feature for representing full point cloud, and then use a fully connected network for classification.
4.3 Fusing Two Convolutions
As one of our main contributions, we also try to answer a question ”Can we combine two convolutions for further boosting the performance?” The answer is yes but only works when the two convolutions are in different types or complementary (please see Section 6), say surface-style and volumetric-style.
In this section, we propose two convenient and quick fusion strategies, by combining two convolution operators in a single framework. First one is fusing different convolutional features, similar to inception net . As shown in Fig.7, we design a parallel residual block. Given an input feature, apply multiple convolutions in parallel and then concatenate their outputs as fused feature. This strategy is suitable for some compatible methods, such like SA Module of PointNet++ , PointConv , both using point cloud as input and applying downsampling strategy, which is the same used in our architecture.
While for other incompatible methods, such as TextureNet  using mesh as an additional input, and KPConv  applying grid downsampling, we have second fusion strategy by concatenating their output features in the last second layer of networks, an then applying a tiny network for fusion.
|TextureNet ||S||QF ||56.6||-||-|
|FP PointConv||S + V||-||-||64.4||-|
|FP PointConv||S + V||FPS||-||64.8||-|
|FP KPConv||S + V||-||-||66.7||-|
To demonstrate the efficacy of our proposed convolution, we conduct experiments on point cloud semantic segmentation and classification tasks. ModelNet40  is used for shape classification. Two large scale datasets named Stanford Large-Scale 3D Indoor Space (S3DIS)  and ScanNet  are used for 3D point cloud segmentation. We implement our FPConv with PyTorch . Momentum gradient descent optimizer is used to optimize a point-wise cross entropy loss, with a momentum of 0.98, and an initial learning rate of 0.01 scheduled by cosine LR scheduler . Leaky ReLU and batch normalization are applied after each layer except the last fully connected layer. We trained our models 100 epochs for S3DIS and ModelNet40, 300 epochs for ScanNet.
5.1 3D Shape Classification
ModelNet40  contains 12311 3D meshed models from 40 categories, with 9843 for training and 2468 for testing. Normal is used as additional input feature in our model. Moreover, randomly rotation among the -axis and jittering are also used for data augmentation. As shown in Table.2, our model achieves state-of-the-art performance among surface-style methods.
5.2 Large Scene Semantic Segmentation
Data. S3DIS  contains 3D point clouds of 6 areas, totally 272 rooms. Each point in the scan is annotated with one of the semantic labels from 13 categories (chair, table, floor, wall etc. plus clutter). To prepare the training data, 14k points are randomly sampled from a randomly picked block of 2m by 2m. Both sampling are on-the-fly during training. While for testing, all points are covered. Each point is represented by a 9-dim vector of XYZ, RGB, and normalized location w.r.t to the room (from 0 to 1). In particular, the sampling rate for each point is 0.5 in every training epoch.
ScanNet  contains 1513 3D indoor scene scans, split into 1201 for training and 312 for testing. There are 21 classes in total and 20 classes are used for evaluation while 1 class for free space. Similar to S3DIS, we randomly sample the raw data in blocks then sample points on-the-fly during training. Each block is of size 2m by 2m, containing 11k points represented by a 6-dim vector, XYZ and RGB.
Pipeline for fusion. As mentioned in Section 4.3, we propose two fusion strategies for fusing conv-kernels of different types. In our experiment, we select PointConv  and KPConv  rigid for comparison on S3DIS. We apply both two fusion strategies on PointConv with FPConv, and the second strategy, fusion on final feature level on FPConv with KPConv and PointConv with KPConv. In our experiments, KPConv rigid is used for fusion, while its deformable version is ignored for missing released pre-trained model and hyper-parameters setting. Thus, in the latter part, we use KPConv to represent KPConv rigid.
Results. Following , we report the results on two settings for S3DIS, the first one is evaluation on Area 5, and another one is 6-fold cross validation (calculating the metrics with results from different folds merged). We report the mean of class-wise intersection over union (mIoU), overall point-wise accuracy (oA) and the mean of class-wise accuracy (mAcc). For Scannet , we report the mIoU score tested on ScanNet bencemark.
Results (mIoU) are shown in Table.1. Detailed results of S3DIS including mIoU of each class are shown in Table.3. As we can see, FPConv outperforms all the existing surface-style learning methods with large margins. Specifically, the mIoU of FPConv on Scannet  benchmark reaches 63.9%, which outperforms the previous best surface-style method by 7.3%. In addition, our FPConv fused with KPConv achieves state-of-the-art performance on S3DIS.
Even though mIoU of S3DIS of our FPConv is lower than KPConv, there are still IoUs of some classes outperform the ones of KPConv, such as ceiling, floor, board, etc. Particularly, we find that all of these classes are flat objects, which should have small curvatures. Based on this discovery, we further conduct several ablation studies to explore the relationship between segmentation performance of FPConv and objects curvatures, as shown in next section. Visualization of result is shown in Fig.8 for S3DIS and Fig.9 for ScanNet.
6 Ablation Study
Two ablation studies are conducted, the first one is exploring fusion of surface-style and volumetric-style convolutions. Another one is the effect of detailed configurations, normalization methods and plane size on FPConv.
6.1 On Fusion of S.Conv and V.Conv
We firstly study the performance for different combination methods of the two convolutions. Before that, we show an experimental finding that they are complementary and good at analyzing different specific scenes.
Performance vs. Curvature
As experiments mentioned in Section 5.2, we claim that FPConv can perform better on area with small curvature. To be more convincing, we analyzed the relationship between overall accuracy and curvatures, which is shown in the left of Fig.10. We can see that FPConv outperforms PointConv  and KPConv  when curvatures are small, and FPConv cannot perform very well on structures which have large curvatures. Moreover, the histogram of distribution of points curvatures shown in the right of Fig.10 implies almost all points have either large curvatures or small curvatures. This explains why there is a huge performance degradation when curvature increases. Furthermore, as shown in Fig.11, we highlight points (in red) with incorrect prediction, and points (in red) with large curvature. It is oblivious that incorrect prediction is concentrated on area with large curvature and FPConv performs well in flat area.
Ablation analysis on fusion method
As mentioned above, FPConv which is a surface-style convolution performs better in flat area, worse in rough area and KPConv, as a volumetric-style convolution performs oppositely. We believe that they can be complementary to each other and conduct 4 fusion experiments, FPConv PointConv, FPConv PointConv, KPConv PointConv, and FPConv KPConv, where represents fusion in final feature level and represents fusion in conv level. We don’t conduct fusion of FPConv and KPConv in conv level for their incompatible downsampling strategies. As shown in Table.3, fusion of FPConv with PointConv or KPConv brings a great improvement, while fusion of PointConv with KPConv brings little improvement. Therefore, we can claim that our FPConv can be complementary to volumetric-style convolutions, which may direct the convolution design for point cloud in the future.
Visual results are shown in Fig.8. Our FPConv can capture better flat structures than KPConv, such as the class column that does not shown in KPConv. While KPConv can capture better complex structures, such as the door. Moreover, the fusion of KPConv and FPConv can achieve better results than both KPConv and FPConv.
|w sparse norm + 6x6||62.8||69.0||88.3|
|w dense norm + 6x6||61.6||68.5||87.6|
|w/o norm + 6x6||59.8||67.1||86.2|
|w sparse norm + 5x5||61.8||68.1||88.4|
6.2 On FPConv Architecture Design
We conduct 4 experiments as shown in Table.4, to study influence of normalization method and the size of grid plane on performance of FPConv. It tells us that, sparse-norm which indicates 2-step normalization method mentioned in Section 3.2 performs better than dense-norm. In addition, higher resolution of grid plane may achieve better performance, while bring higher memory cost as well.
In this work, we propose FPConv, a novel surface-style convolution operator on 3D point cloud. FPConv takes a local region of point cloud as input, and flattens it onto a 2D grid plane by predicting projection weights, followed by regular 2D convolutions. Our experiments demonstrate that FPConv significantly improved the performance of surface-style convolution methods. Furthermore, we discover that surface-style convolution can be a complementary to volumetric-style convolution and jointly training can boost the performance into state-of-the-art. We believe that surface-style convolutions can play an important role in feature learning of 3D data and is a promising direction to explore.
This work was supported in part by grants No.2018B030338001, NSFC-61902334, NSFC-61629101, No.2018YFB1800800, No.ZDSYS201707251409055 and No.2017ZT07X152.
The supplementary material contains:
A. More results of the proposed fusion strategy
a. Fusing FPConv and PointConv on ScanNet
We conduct experiments on fusion of FPConv with PointConv  on ScanNet . The results are reported in Table.5, where all methods are performed under same settings (architecture, hyper parameters, etc.). Note that we reduce sampled points to 8k in a block of 1.5m 1.5m for all experiments.
b. Fusing FPConv and KPConv-deform on S3DIS
B. More Results on Segmentation Tasks
We provide more details of our experimental results. As shown in Table.7, we compare our FPConv with other popular methods on S3DIS  6-fold cross validation, which shows that FPConv can achieve higher score on flat-shaped objects, such like ceiling, floor, table, board, etc. While KPConv , a volumetric-style method, performs better on complex structures. More visual results are shown in Fig.12 and Fig.13.
- (2016) 3D semantic parsing of large-scale indoor spaces. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, Cited by: §2, §5.2, §5, 2nd item, b. Fusing FPConv and KPConv-deform on S3DIS, B. More Results on Segmentation Tasks.
- (2018) Point convolutional neural networks by extension operators. arXiv preprint arXiv:1803.10091. Cited by: §2.
- (2017) Bounding boxes, segmentations and object coordinates: how important is recognition for 3d scene flow estimation in autonomous driving scenarios?. In Proceedings of the IEEE International Conference on Computer Vision, pp. 2574–2583. Cited by: §1.
- (2017) Unstructured point cloud semantic labeling using deep segmentation networks.. 3DOR 2, pp. 7. Cited by: §2.
- (2017) Geometric deep learning: going beyond euclidean data. IEEE Signal Processing Magazine 34 (4), pp. 18–42. Cited by: §2.
- (2013) Spectral networks and locally connected networks on graphs. arXiv preprint arXiv:1312.6203. Cited by: §2.
- (2004) Supervised parametric classification of aerial lidar data. In 2004 Conference on Computer Vision and Pattern Recognition Workshop, pp. 30–30. Cited by: §1.
- (2009) Contribution of airborne full-waveform lidar and image data for urban scene classification. In 2009 16th IEEE International Conference on Image Processing (ICIP), pp. 1669–1672. Cited by: §1.
- (2017) Scannet: richly-annotated 3d reconstructions of indoor scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5828–5839. Cited by: §1, §2, §5.2, §5.2, §5.2, §5, 1st item, a. Fusing FPConv and PointConv on ScanNet.
- (2016) Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in neural information processing systems, pp. 3844–3852. Cited by: §1, §2.
- (2009) Shape-based recognition of 3d point clouds in urban environments. In 2009 IEEE 12th International Conference on Computer Vision, pp. 2154–2161. Cited by: §1.
- (2018) Flex-convolution. In Asian Conference on Computer Vision, pp. 105–122. Cited by: §2.
- (2015) Indoor scene understanding with rgb-d images: bottom-up segmentation, object detection and semantic segmentation. International Journal of Computer Vision 112 (2), pp. 133–149. Cited by: §2.
- (2015) Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385. Cited by: Figure 6, §4.1.
- (2015) Deep convolutional networks on graph-structured data. arXiv preprint arXiv:1506.05163. Cited by: §2.
- (2018) Monte carlo convolution for learning on non-uniformly sampled point clouds. In SIGGRAPH Asia 2018 Technical Papers, pp. 235. Cited by: §2.
- (2018) Pointwise convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 984–993. Cited by: §2.
- (2019) Texturenet: consistent local parametrizations for learning from high-resolution signals on meshes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4440–4449. Cited by: §1, §1, §2, §4.3, Table 1.
- (2018) Recurrent slice networks for 3d segmentation of point clouds. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2626–2635. Cited by: Table 7.
- (2019) Hierarchical point-edge interaction network for point cloud semantic segmentation. In Proceedings of the IEEE International Conference on Computer Vision, pp. 10433–10441. Cited by: Table 1, Table 7.
- (2016) Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907. Cited by: §2.
- (2018) Large-scale point cloud semantic segmentation with superpoint graphs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4558–4567. Cited by: Table 7.
- (2017) Deep projective 3d semantic segmentation. In International Conference on Computer Analysis of Images and Patterns, pp. 95–107. Cited by: §2.
- (2018) So-net: self-organizing network for point cloud analysis. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 9397–9406. Cited by: §2.
- (2018) PointCNN. arXiv preprint arXiv:1801.07791. Cited by: §2, Table 1, Table 2, Table 7.
- (2016) Lstm-cf: unifying context modeling and fusion with lstms for rgb-d scene labeling. In European conference on computer vision, pp. 541–557. Cited by: §2.
- (2016) Sgdr: stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983. Cited by: §5.
- (2015) 3d all the way: semantic segmentation of urban scenes from start to end in 3d. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4456–4465. Cited by: §1.
- (2015) Geodesic convolutional neural networks on riemannian manifolds. In Proceedings of the IEEE international conference on computer vision workshops, pp. 37–45. Cited by: §2.
- (2015) Voxnet: a 3d convolutional neural network for real-time object recognition. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 922–928. Cited by: §1, §2.
- (2017) Semanticfusion: dense 3d semantic mapping with convolutional neural networks. In 2017 IEEE International Conference on Robotics and automation (ICRA), pp. 4628–4635. Cited by: §2.
- (2017) Geometric deep learning on graphs and manifolds using mixture model cnns. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5115–5124. Cited by: §2.
- (2018) Convolutional neural networks on 3d surfaces using parallel frames. arXiv preprint arXiv:1808.04952. Cited by: §1, §1, §2, Table 1.
- (2017) Automatic differentiation in pytorch. In NIPS-W, Cited by: §5.
- (2018) Ground extraction from 3d lidar point clouds with the classification learner app. In 2018 26th Mediterranean Conference on Control and Automation (MED), pp. 1–9. Cited by: §1.
- (2017) Pointnet: deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 652–660. Cited by: §1, §2, §3.2, Table 1, Table 2, §5.2, Table 7.
- (2017) Pointnet++: deep hierarchical feature learning on point sets in a metric space. In Advances in neural information processing systems, pp. 5099–5108. Cited by: §1, §2, Figure 6, §4.2, §4.3, Table 1, Table 2.
- (2017) [POSTER] augmented things: enhancing ar applications leveraging the internet of things and universal 3d object tracking. In 2017 IEEE International Symposium on Mixed and Augmented Reality (ISMAR-Adjunct), pp. 103–108. Cited by: §1.
- (2017) Octnet: learning deep 3d representations at high resolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3577–3586. Cited by: §1, §2.
- (2017) Dynamic edge-conditioned filters in convolutional neural networks on graphs. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3693–3702. Cited by: §1, §2.
- (2017) Semantic scene completion from a single depth image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1746–1754. Cited by: §2.
- (2015) Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1–9. Cited by: §4.3.
- (2018) Tangent convolutions for dense prediction in 3d. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3887–3896. Cited by: §1, §1, §2, Table 1.
- (2019) KPConv: flexible and deformable convolution for point clouds. arXiv preprint arXiv:1904.08889. Cited by: §1, §2, §4.2, §4.3, Table 1, Table 2, Table 3, Figure 8, Figure 8, §5.2, §6.1, 2nd item, b. Fusing FPConv and KPConv-deform on S3DIS, Table 6, Table 7, B. More Results on Segmentation Tasks.
- (2018) Feastnet: feature-steered graph convolutions for 3d shape analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2598–2606. Cited by: §2.
- (2018) Deep parametric continuous convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2589–2597. Cited by: §2.
- (2019) Dynamic graph cnn for learning on point clouds. ACM Transactions on Graphics (TOG) 38 (5), pp. 146. Cited by: §2.
- (2019) Pointconv: deep convolutional networks on 3d point clouds. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9621–9630. Cited by: §1, §4.3, Table 1, Table 2, Table 3, §5.2, §6.1, 1st item, a. Fusing FPConv and PointConv on ScanNet, Table 5.
- (2015) 3d shapenets: a deep representation for volumetric shapes. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1912–1920. Cited by: §1, §2, §5.1, §5.
- (2018) Spidercnn: deep learning on point sets with parameterized convolutional filters. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 87–102. Cited by: §2.
- (2017) Syncspeccnn: synchronized spectral cnn for 3d shape segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2282–2290. Cited by: §1, §2.
- (2018) A lidar point cloud generator: from a virtual world to autonomous driving. In Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval, pp. 458–464. Cited by: §1.