3D Point Cloud Denoising via Deep Neural Network based Local Surface Estimation

3D Point Cloud Denoising via Deep Neural Network based Local Surface Estimation


We present a neural-network-based architecture for 3D point cloud denoising called neural projection denoising (NPD). In our previous work, we proposed a two-stage denoising algorithm, which first estimates reference planes and follows by projecting noisy points to estimated reference planes. Since the estimated reference planes are inevitably noisy, multi-projection is applied to stabilize the denoising performance. NPD algorithm uses a neural network to estimate reference planes for points in noisy point clouds. With more accurate estimations of reference planes, we are able to achieve better denoising performances with only one-time projection. To the best of our knowledge, NPD is the first work to denoise 3D point clouds with deep learning techniques. To conduct the experiments, we sample 40000 point clouds from the 3D data in ShapeNet to train a network and sample 350 point clouds from the 3D data in ModelNet10 to test. Experimental results show that our algorithm can estimate normal vectors of points in noisy point clouds. Comparing to five competitive methods, the proposed algorithm achieves better denoising performance and produces much smaller variances. Our code is available at https://github.com/chaojingduan/Neural-Projection.


Chaojing Duan   Siheng Chen   Jelena Kovačević \address Department of Electrical & Computer Engineering, Carnegie Mellon University, Pittsburgh, USA
Mitsubishi Electric Research Laboratories, Boston, USA
Tandon School of Engineering, New York University, Brooklyn, USA {keywords} point cloud, denoising, deep learning, normal vector, reference plane

1 Introduction

The rapid development of 3D sensing techniques and the emerging field of 2D image-based 3D reconstruction make it possible to sample or generate millions of 3D points from surfaces of objects [16, 1, 14]. 3D point clouds are discrete representations of continuous surfaces and are widely used in robotics, virtual reality, and computer-aided shape design. 3D point clouds sampled by 3D scanners are generally noisy due to measurement noise, especially around edges and corners [13]. 3D point clouds reconstructed from multi-view images contain noise since the reconstructed algorithms fail to manage matching ambiguities [12, 4]. The inevitable noise in 3D point clouds undermines the performance of surface reconstruction algorithms and impairs further geometry processing tasks since the fine details are lost and the underlying manifold structures are prone to be deformed [5].

However, 3D point clouds denoising or processing is challenging because 3D point clouds are permutation invariant and the neighboring points representing the local topology interact without any explicit connecting information. To denoise 3D point clouds, we aim to estimate the continuous surface localized around each 3D point and remove noise by projecting noisy points to the corresponding local surfaces. The intuition is that noiseless points are sampled from surfaces. To estimate local surfaces, we parameterize them by 2D reference planes. By projecting noisy points to the estimated reference planes, we ensure that all the denoised points come from the underlying surfaces.

Deep neural networks have shown ground-breaking performances in various domains, such as speech processing and image processing [6]. Recently, several deep-learning architectures have been proposed to deal with 3D point clouds in tasks such as classification, segmentation, and upsampling [15, 19, 20]. In this work, we learn reference planes from noisy point clouds and further reduce noise with deep learning; we name the proposed algorithm as neural projection denoising (NPD). Estimated reference planes are represented by normal vectors and interceptions. The reason for using deep learning is that previous algorithms are not robust enough to noise intensity, sampling density, and curvature variations. These algorithms need to define neighboring points to capture local structures. However, it is difficult to choose the neighboring points adaptively according to the sampling density or curvature variation.

In our experiments, the point clouds used for training are sampled from the 3D dataset ShapeNet and the point clouds used for testing are sampled from the 3D dataset ModelNet10 [3, 18]. The experimental results show that NPD outperforms four of five other denoising algorithms in all seven categories and achieves the lowest variance in both evaluation metrics.

Contributions. 1) To the best of our knowledge, NPD is the first to directly deal with 3D point clouds for denoising tasks with deep learning techniques; 2) NPD can estimate normal vector for each point with both local and global information and is less affected by noise intensity and curvature variation; 3) NPD can denoise noisy point clouds without defining neighboring points for noisy point clouds or calculating the eigendecomposition to estimate local geometries; 4) NPD provides the possibility of 3D point cloud parameterization with the combination of local and global information.

2 Related work

Point cloud denoising. 3D point cloud denoising has been tackled by various approaches: mesh-based denoising, graph-based denoising, and projection-based denoising. Bilateral filter (BF) and partial differential equations (PDE) are widely used in mesh and point cloud denoising [8, 11], but these mesh-based algorithms cause shrinkage and deformation [7]. Graph-based denoising (GBD) algorithms have received increasing attention because the Laplace-Beltrami operator of manifolds can be approximated by the graph Laplacian [2, 21]; however, a graph constructed from a noisy point cloud is also noisy and cannot reflect the true manifold, causing deformation issues [9]. Projection-based denoising algorithm named weighted multi-projection (WMP) in [10] estimates reference planes for 3D points and project noisy 3D points to the planes multiple times for denoising purpose. NPD estimates the reference planes with deep learning techniques and project noisy point clouds only once.

Deep learning on 3D data. Researchers first attempted to handle 3D data with deep learning techniques by voxelizing 3D shapes and applying 3D convolutional neural networks [18]. Voxelization as a 3D data pre-processing procedure is computationally intensive and leads in quantization artifacts [15]. The authors in [15] proposed an architecture that directly consumed 3D point clouds without voxelizing or converting for classification and segmentation tasks. In the paper, we adopt the idea and redesign the framework to estimate normal vectors of points in a noisy point cloud and directly denoise 3D point cloud via deep learning techniques.

3 3D point cloud denoising algorithm

Problem formulation. Let be a 3D noiseless point cloud, where is the total number of points in , and each point is a coordinate vector. Let represent a noisy 3D point cloud, and each point is a coordinate vector. Note that , where is a 3D noise vector attached to the point and .

Since 3D point clouds are sampled from surfaces, they are essentially 2D manifolds embedded in 3D space. These surfaces can be locally approximated by 2D reference planes. One of our goals is learning reference planes for points with the existence of noise, but the accurate reference planes for the ground-truth surface are unknown. Unlike the work in [10] which calculate the reference planes by constructing weighted covariance matrices from noisy point clouds, NPD estimates the reference planes by learning the global and local geometries via deep learning techniques.

Figure 1: System pipeline: During pre-processing, we calculate reference planes for points in noiseless point clouds as in [10]. During denoising, we process noisy point clouds to estimate reference planes for points. We then project noisy points to their corresponding estimated reference planes to denoise the point clouds. The noiseless point clouds and the calculated reference planes are used to supervise the denoised point clouds and the estimated reference planes, respectively.

Algorithm. The system pipeline is shown in Fig. 1. During pre-processing, we use graph-based techniques to calculate reference planes for points as in [10]. Let be the reference plane with normal vector and interception calculated from noiseless point cloud for point . Let be the reference plane with normal vector and interception estimated from our neural network.

During denoising procedure, we project noisy point to the estimated plane to obtain the denoised point as

The reference plane and the noiseless point cloud are used to constraint or supervise the reference plane estimation and the denoised point cloud, respectively.

Figure 2: NPD architecture: The proposed network takes noisy point clouds with points as input, then passes the inputs through the shared multi-layer perceptron (MLP) to obtain local features. We mark MLP as shared because each point goes through the same MLP and the feature vector is obtained independently and identically as in [15]. The global features are captured by max-pooling, which selects the max value of each of 512 columns to produce a global feature vector. We concatenate global and local features, then pass them through MLP to obtain reference planes represented as an matrix. A function is applied to project the noisy points to the estimated reference planes to denoise point clouds. The denoised point cloud and estimated reference planes are supervised by the noiseless point cloud and the calculated reference plane , respectively. We use MSE loss to calculate loss1 (Eq. 1), which constraints the denoised point clouds to stay close to the noiseless point clouds. We use the cosine similarity loss to calculate loss2 (Eq. 2), which constrains the estimated reference planes to be similar to the calculated reference planes [10]. The number of MLP layers are shown in brackets.

NPD architecture is shown in Fig. 2. The architecture directly takes noisy point clouds as inputs and passes through shared multi-layer perceptron (MLP), which means each perceptron is applied to the feature vectors of each point independently and identically as in [15]. The max-pooling is selecting the max value of each column of the matrix; the global information is contained in a 1D vector for each point cloud and represents a whole point cloud [15].

The local feature vector contains local information for each point. The extracted global information is concatenated to make our architecture less sensitive to sampling densities and able to learn the number of neighboring points adaptively. 3D point cloud denoising requires local information of each point since reference planes are different for different points due to their local geometries. For example, a point on a flat surface and a point on a sharp edge contain different local information and they should be treated differently. 3D point cloud denoising also requires global information for each point in a point cloud. For example, the edge distribution and the curvature variation of an airplane and a table are distinct. It is essential that each point is aware of the skeleton and the global view of the surface from which it is sampled.

We pass the combined information through shared MLPs to learn reference planes. We then project the points in noisy point clouds to their corresponding reference planes. The projected points are the denoised points sampled from the reference planes. loss1 is treated as a point-cloud loss and we calculate mean-squared-error (MSE) between the noiseless point cloud and the denoised point cloud ,


loss2 is treated as a reference-plane loss and we calculate the absolute value of the cosine similarity.


The training loss is calculated as

where is a parameter that we can tune during the training for the best performance. We also show the effect of with three small datasets in Fig. 3.

4 Experimental results

Dataset. We train our network with ShapeNet data, which contains 55 object categories with more than 50,000 3D mesh models [3]. We test our net with the dataset in ModelNet10, which contains 10 object categories with more than 3000 3D mesh models [18]. We select 40000 3D meshes in ShapeNet and sampled point clouds for the training procedure and each point cloud contains 2048 points; we randomly select 50 3D meshes in seven categories in ModelNet10 and sample point clouds for the testing procedure. The point clouds are rescaled into a unit cube and centered at the origin. We pre-process the training data to obtain reference planes of points and add i.i.d. Gaussian noise to construct noisy point clouds. We compare the denoising results from NPD with the state-of-the-art denoising algorithms, including BF algorithm [8], GDB algorithm [17], PDE algorithm [11], Non-local Denoising (NLD) algorithm [7] and MWP [10]. For all the algorithms, we tune parameters to produce their best performances.

Results and analysis. To quantify the performances of different algorithms, we use the metrics MSE (Eq.(1)) and Chamfer distance (CD) defined as,

where and denote denoised point clouds and noiseless point clouds, respectively. We have , and is the total number of points.

To show the effects of in our loss function, we vary to train the neural network with smaller datasets. MSE and CD errors are shown in Fig. 3.

Figure 3: For all three categories (sofa, bed, and toilet), MSE (left) decreases and CD (right) increases as increases.

The tendencies of MSE error and CD error are different as the value of increases. CD promotes the plane-to-plane matching; MSE promotes the point-to-point matching. 1) When , we restrict the predicted reference planes stay closer to the calculated reference planes. It makes the manifolds that the denoised point clouds are lying on close to the manifolds that the noiseless point clouds are lying on, which produces the smaller CD values; 2) When , we restrict the point clouds calculated from the predicted reference planes stay close to the noiseless point clouds . It makes the denoised point clouds stay close to the noiseless point clouds, which produces the smaller MSE values.

We also train a network without providing the global feature in Fig. 2 and the errors are shown in Table 1.

Loss () Local+global features local feature only
Category MSE CD MSE CD
Sofa 0.627 0.7670 10.845 10.4758
Bed 0.700 0.7356 13.5513 9.9097
Toilet 1.617 0.8418 11.307 9.8542
Table 1: MSE error and CD error for three categories (sofa, bed, and toilet) with and without global features. It shows that the global features significantly improve the denoising performance by providing global contextual information, which is rarely considered in most previous works on denoising.

In our experiments, we start the training with small and increase the value of gradually. Figs. 4 and Fig. 5 show the denoising performances in seven categories evaluated by MSE and CD (mean and standard deviation), respectively. We see that NPD algorithm outperforms four of its competitors in all seven categories by the defined metrics. NPD algorithm also produces the smallest variance for all the categories and both evaluation metrics. We analyze the reason to be that the number of point clouds and the number of object categories in the training dataset are large enough and the trained network is powerful enough to denoise all the point clouds in our testing dataset.

Figure 4: The proposed algorithm (in red) outperforms all of its competitors in terms of MSE. The denoised point clouds produced by our architecture stay point-wise closer to the noiseless point clouds compared to the denoised point clouds produced by other algorithms.

Figure 5: The proposed algorithm (in red) outperforms its competitors in terms of CD. The denoised point clouds produced by our architecture are closer to the original point clouds or the manifolds point clouds are lying on compared to the denoised point clouds produced by other algorithms.

5 Conclusions

We propose a novel algorithm for 3D point cloud denoising by estimating reference planes for points with deep learning techniques called neural projection algorithm (NPD). Without searching for neighboring points for each point in noisy point clouds as previous algorithms, we directly estimate reference planes and project noisy points to their corresponding reference planes. We validate the proposed architecture on real datasets and the proposed architecture beats BF, PDE, GBD, and NLD algorithms. The proposed algorithm produces a much smaller variance in all seven categories for both two evaluation metrics.


  • [1] A. Aldoma, Z. C. Marton, F. Tombari, W. Wohlkinger, C. Potthast, B. Zeisl, R. B. Rusu, S. Gedikli, and M. Vincze (2012-Sept) Tutorial: point cloud library: three-dimensional object recognition and 6 dof pose estimation. IEEE Robotics Automation Magazine 19 (3), pp. 80–91. External Links: Document, ISSN 1070-9932 Cited by: §1.
  • [2] M. Belkin and P. Niyogi (2005) Towards a theoretical foundation for laplacian-based manifold methods. In Learning Theory, Berlin, Heidelberg, pp. 486–500. External Links: ISBN 978-3-540-31892-7 Cited by: §2.
  • [3] A. X. Chang, T. Funkhouser, L. Guibas, P. Hanrahan, Q. Huang, Z. Li, S. Savarese, M. Savva, S. Song, H. Su, et al. (2015) Shapenet: an information-rich 3d model repository. arXiv preprint arXiv:1512.03012. Cited by: §1, §4.
  • [4] H. Chen and J. Shen (2018) Denoising of point cloud data for computer-aided design, engineering, and manufacturing. Engineering with Computers 34 (3), pp. 523–541. Cited by: §1.
  • [5] S. Chen, D. Tian, C. Feng, A. Vetro, and J. Kovačević (2018-02) Fast resampling of three-dimensional point clouds via graphs. IEEE Transactions on Signal Processing 66 (3), pp. 666–681. External Links: Document, ISSN 1053-587X Cited by: §1.
  • [6] R. Collobert and J. Weston (2008) A unified architecture for natural language processing: deep neural networks with multitask learning. In Proceedings of the 25th International Conference on Machine Learning, ICML ’08, New York, NY, USA, pp. 160–167. External Links: ISBN 978-1-60558-205-4 Cited by: §1.
  • [7] J. Deschaud and F. Goulette (2010) Point cloud non local denoising using local surface descriptor similarity. IAPRS 38 (3A), pp. 109–114. Cited by: §2, §4.
  • [8] J. Digne and C. de Franchis (2017) The Bilateral Filter for Point Clouds. Image Processing On Line 7, pp. 278–287. External Links: Document Cited by: §2, §4.
  • [9] C. Dinesh, G. Cheung, I. V. Bajic, and C. Yang (2018) Fast 3d point cloud denoising via bipartite graph approximation & total variation. Cited by: §2.
  • [10] C. Duan, S. Chen, and J. Kovačević (2018) Weighted multi-projection: 3d point cloud denoising with tangent planes. IEEE Global Conference on Signal and Informationn Processing. Cited by: §2, Figure 1, Figure 2, §3, §3, §4.
  • [11] A. Elmoataz, O. Lezoray, and S. Bougleux (2008-07) Nonlocal discrete regularization on weighted graphs: a framework for image and manifold processing. Trans. Img. Proc. 17 (7), pp. 1047–1060. External Links: ISSN 1057-7149 Cited by: §2, §4.
  • [12] Y. Furukawa and J. Ponce (2010-08) Accurate, dense, and robust multiview stereopsis. IEEE Transactions on Pattern Analysis and Machine Intelligence 32 (8), pp. 1362–1376. External Links: Document, ISSN 0162-8828 Cited by: §1.
  • [13] J. Han, L. Shao, D. Xu, and J. Shotton (2013-10) Enhanced computer vision with microsoft kinect sensor: a review. IEEE Transactions on Cybernetics 43 (5), pp. 1318–1334. External Links: Document, ISSN 2168-2267 Cited by: §1.
  • [14] J. Park, H. Kim, Y. Tai, M. S. Brown, and I. Kweon (2011-11) High quality depth map upsampling for 3d-tof cameras. In 2011 International Conference on Computer Vision, Vol. , pp. 1623–1630. External Links: Document, ISSN 1550-5499 Cited by: §1.
  • [15] C. R. Qi, H. Su, K. Mo, and L. J. Guibas (2017) Pointnet: deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 652–660. Cited by: §1, §2, Figure 2, §3.
  • [16] R. B. Rusu and S. Cousins (2011) 3D is here: point cloud library (pcl). 2011 IEEE International Conference on Robotics and Automation, pp. 1–4. Cited by: §1.
  • [17] Y. Schoenenberger, J. Paratte, and P. Vandergheynst (2015) Graph-based denoising for time-varying point clouds. In 2015 3DTV-Conference: The True Vision-Capture, Transmission and Display of 3D Video (3DTV-CON), pp. 1–4. Cited by: §4.
  • [18] Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, and J. Xiao (2015-06) 3D shapenets: a deep representation for volumetric shapes. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1912–1920. External Links: Document, ISSN 1063-6919 Cited by: §1, §2, §4.
  • [19] Y. Yang, C. Feng, Y. Shen, and D. Tian (2018) Foldingnet: interpretable unsupervised learning on 3d point clouds. arXiv preprint arXiv:1712.07262. Cited by: §1.
  • [20] L. Yu, X. Li, C. Fu, D. Cohen-Or, and P. Heng (2018) Pu-net: point cloud upsampling network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2790–2799. Cited by: §1.
  • [21] J. Zeng, G. Cheung, M. Ng, J. Pang, and C. Yang (2018) 3d point cloud denoising using graph laplacian regularization of a low dimensional manifold model. arXiv preprint arXiv:1803.07252. Cited by: §2.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description