PointCleanNet: Learning to Denoise and Remove Outliers
from Dense Point Clouds
Abstract
Point clouds obtained with 3D scanners or by imagebased reconstruction techniques are often corrupted with significant amount of noise and outliers. Traditional methods for point cloud denoising largely rely on local surface fitting (e.g., jets or MLS surfaces), local or nonlocal averaging, or on statistical assumptions about the underlying noise model. In contrast, we develop a simple datadriven method for removing outliers and reducing noise in unordered point clouds. We base our approach on a deep learning architecture adapted from PCPNet, which was recently proposed for estimating local 3D shape properties in point clouds. Our method classifies and discards outlier samples, and estimates correction vectors that project noisy points onto the original clean surfaces. The approach is efficient and robust to varying amounts of noise and outliers, while being able to handle large denselysampled point clouds. In our extensive evaluation, we show an increased robustness to strong noise levels compared to various stateoftheart methods, enabling highquality surface reconstruction from extremely noisy real data obtained by range scans. Finally, the simplicity and universality of our approach makes it very easy to integrate in any existing geometry processing pipeline. (Both code and pretrained networks will be released.)
1 Introduction
Raw 3D point clouds obtained directly from acquisition devices such as laser scanners or as output of a reconstruction algorithm (e.g., imagebased reconstruction) are regularly contaminated with noise and outliers. The first stage of most geometry processing workflows typically involves cleaning such raw point clouds by discarding the outlier samples and denoising the remaining points to reveal the (unknown) scanned surface. The clean output is then used for a range of applications like surface reconstruction, shape matching, model retrieval, etc., which often assume their input to be noisefree.
Any good point cloud cleanup algorithm should (i) balance between denoising and featurepreservation, i.e., remove outliers and noise while retaining data fidelity by preserving sharp edges and local details of the underlying scanned surface; (ii) be selftuning, i.e., not require as input precise estimates of the noise model or statistics of the unknown scanned surface (e.g., local surface type or curvature characteristics); (iii) be invariant to permutation and rigid transform applied to the pointset, i.e., the denoised output should not depend on angle of scanning or choice of coordinate system; and (iv) avoid unnecessarily degrading the input, i.e., leave the points on the scanned surface if the input happens to be noisefree. Note that the last criterion implies that the algorithm should not oversmooth the output if the algorithm is iterated multiple times.
Decades of research have produced many variants of denoising approaches targeted for different surface types and noise models (see survey [Han et al., 2017]). Such approaches can be broadly categorized as: classifying points as outliers using statistical methods, e.g., [Aggarwal, 2015]; projecting points to estimated local surfaces (e.g., MLS surface, jet fitting, etc.) [Fleishman et al., 2005, Cazals and Pouget, 2005, Cazals and Pouget, 2007]; consolidating similar patches to cancel out iid noise perturbations (e.g., nonlocal means, dictionarybased sparse coding), e.g., [Digne, 2012]; or, local smoothing using auxiliary input information (e.g., bilateral smoothing) [Huang et al., 2013], among many others. Unfortunately, there is no single winner among these methods. The choice of algorithm and its parameters often depends on the scanned surface and the noise characteristics of the acquisition setup. Given that the complexity of the underlying geometry and the noise characteristics are, at best, partially known at acquisition time, choosing an optimal algorithm with associated parameters is typically an iterative trialanderror process.
Inspired by the recent successes of applying deep learning techniques for the analysis and processing of geometric data, including [Masci et al., 2015, Bronstein et al., 2017, Wei et al., 2016] among many others, and especially the seminal works designed for learning directly on point clouds [Qi et al., 2017a, Wang et al., 2018a], in this paper, we present PointCleanNet, a simple datadriven denoising approach. Specifically, we design a two stage point cloud cleaning network based on the recently proposed PCPNet architecture [Guerrero et al., 2018] to estimate robust local features and use this information to denoise the point cloud. At training time, a variety of surface patches extracted from a set of shapes is synthetically corrupted with outliers and noise of varying magnitudes (including zero noise). This artificially corrupted set is then used to train PointCleanNet. Our twostage method first removes outlier samples and then estimates correction vectors for the remaining points. Figure 1 shows an example on a raw realworld scanned point cloud (source ETHZ dataset).
The process is enabled by a novel loss function that effectively cleans pointsets without requiring explicit information about the underlying surface or noise characteristics. Intuitively, the network learns to identify local noisefree patches based on estimated features extracted from corresponding raw pointsets and proposes perpoint correction vectors. In other words, the network implicitly builds a dictionary of local surface patches in the form of local learned features and uses it to classify input points as outliers and project the remaining ones onto an ensemble of dictionary patches. At test time, our denoising network directly consumes raw input point clouds, classifies and discards outlier measurements, and denoises the remaining points. The approach is simple to train and use, and does not expect the user to provide parameters to characterize the surface or noise model. Additionally, unlike traditional approaches, our denoising network can easily be adapted to particular shape families and nonstandard noise models.
We qualitatively and quantitatively evaluate PointCleanNet on a range of synthetic datasets (with access to groundtruth surfaces) and real world datasets. In our extensive tests, our approach performed better than a variety of stateoftheart denoising approaches (even with manuallytuned parameters) across both shape and medium to high noise variations. Additionally, the simplicity and universality of our approach makes it very easy to integrate in any existing geometry processing workflow. (Code and pretrained networks will be released.)
2 Related Work
Point cloud denoising and outlier removal have a long and rich history in diverse areas of computer science and a full overview is beyond the scope of the current article. Below, we briefly review the main general trends for addressing these problems, while concentrating on solutions most closely related to ours, and refer the interested reader to a recent survey [Han et al., 2017].
Outlier removal.
The earliest classical approaches for outlier detection, classification and removal have been proposed primarily in the statistics and data mining communities, in the general setting of point clouds in arbitrary dimensions, with several monographs dedicated specifically to this topic [Pincus, 1995, BenGal, 2005, Rousseeuw and Hubert, 2011, Aggarwal, 2015]. These methods are typically based on robust local statistics and most often come with rigorous theoretical guarantees. At the same time, their generality often comes at a cost, as purely statistical methods are often not adapted to the specific features found in geometric 3D shapes, and in most cases require nontrivial parameter tuning.
More recently, several approaches have been proposed for outlier detection, with emphasis on utility for 3D point clouds, arising e.g., from acquisition data, including [Chazal et al., 2011, Guibas et al., 2013, Wolff et al., 2016]. The two former methods are implemented in widely used libraries such as CGAL and have also been used in the context of surface reconstruction from noisy point clouds [Giraudot et al., 2013]. These approaches are very robust, but are also based on setting critical parameters or rely on using additional information such as color [Wolff et al., 2016]. This makes it difficult to apply them, for example, across general noise models, without additional user input and tuning of parameters.
Local surface fitting, bilateral filtering.
Denoising and outlier removal also arise prominently, and have therefore been considered in the context of surface fitting to noisy point clouds, including the widelyused Moving Least Squares (MLS) approach and its robust variants [Alexa et al., 2003, Mederos et al., 2003, Fleishman et al., 2005, Öztireli et al., 2009, Gross and Pfister, 2011]. Similarly, other local fitting approaches have also been used for point cloud denoising, using robust jetfitting with reprojection [Cazals and Pouget, 2005, Cazals and Pouget, 2007] or various forms of bilateral filtering on point clouds [Huang et al., 2013, Digne and De Franchis, 2017], which take into account both point coordinates and normal directions for better preservation of edge features. A closely related set of techniques is based on sparse representation of the point normals for better feature preservation [Avron et al., 2010, Sun et al., 2015, Mattei and Castrodad, 2017]. Denoising is then achieved by projecting the points onto the estimated local surfaces. These techniques are very robust for small noise but can lead to significant over smoothing or oversharpening for high noise levels [Mattei and Castrodad, 2017, Han et al., 2017].
Nonlocal means, dictionarybased methods.
Another very prominent category of methods, inspired in part from imagebased techniques consist in using nonlocal filtering based most often on detecting similar shape parts (patches) and consolidating them into a coherent noisefree point cloud [Deschaud and Goulette, 2010, Zheng et al., 2010, Digne, 2012, Digne et al., 2014, Zeng et al., 2018]. Closely related are also methods, based on constructing “dictionaries” of shapes and their parts, which can then be used for denoising and point cloud filtering, e.g., [Yoon et al., 2016, Digne et al., 2018] (see also a recent survey of dictionarybased methods [Lescoat et al., 2018]). Such approaches are particularly wellsuited for featurepreserving filtering and avoid excessive smoothing common to local methods. At the same time, they also require careful parameter setting and, as we show below, are difficult to apply across a wide variety of point cloud noise and artefacts.
Denoising in images.
Denoising has also been studied in depth in other domains such as for images, with a wide variety of techniques based on both local filtering, total variation smoothing and nonlocal including dictionarybased methods [Buades et al., 2005, Elad and Aharon, 2006, Chambolle et al., 2010, Mairal, 2010, Elad, 2010].
More recently, to address the limitations mentioned above, and inspired by the success of deep learning for other tasks, several learningbased denoising methods have also been proposed for both images [Zhang et al., 2017a, Zhang et al., 2017b, Jin et al., 2017] and meshes [Wang et al., 2016, Benouini et al., 2018], among others. These methods are especially attractive, since rather than relying on setting parameters, they allow the method to learn the correct model from data and adapt for the correct noise setting at test time, without any user intervention. In signal processing literature, it is widely believed that image denoising has reached close to optimal performance [Chatterjee and Milanfar, 2010, Levin and Nadler, 2011]. One of our main motivations is therefore to show the applicability of this general approach in the case of 3D point clouds.
Learning in Point Clouds.
Learningbased approaches, and especially those based on deep learning, have recently attracted a lot of attention in the context of Geometric Data Analysis, with several methods proposed specifically to handle point cloud data, including PointNet [Qi et al., 2017a] and several extensions such as PointNet++ [Qi et al., 2017b] and Dynamic Graph CNNs [Wang et al., 2018a] for shape segmentation and classification, PCPNet [Guerrero et al., 2018] for normal and curvature estimation, P2PNet [Yin et al., 2018] and PUNet [Yu et al., 2018] for crossdomain point cloud transformation and upsampling respectively. Other, convolutionbased architectures have also been used for pointbased filtering, including most prominently the recent PointProNet architecture [Roveri et al., 2018], designed for consolidating input patches, represented via height maps with respect to a local frame, into a single clean point set, which can be used for surface reconstruction. Although such an approach has the advantage of leveraging imagebased denoising solutions, error creeps in in the local normal estimation stage, especially in the presence of noise and outliers.
Unlike these techniques, our goal is to train a generalpurpose method for removing outliers and denoising point clouds, corrupted with potentially very high levels of structured noise. For this, inspired by the success of PCPNet [Guerrero et al., 2018] for normal and curvature estimation, we propose a simple framework aimed at learning to both classify outliers and to displace noisy point clouds by applying an adapted architecture to point cloud patches. We show through extensive experimental evaluation that our approach can handle a wide range of artefacts, while being applicable to dense point clouds, without any user intervention.
3 Overview
As a first step in digitizing a 3D object, we usually obtain a set of approximate point samples of the scanned surfaces. This point cloud is typically an intermediate result used for further processing, for example to reconstruct a mesh or to analyze properties of the scanned object. The quality of these downstream applications depends heavily on the quality of the point cloud. In realworld scans, however, the point cloud is usually degraded by an unknown amount of outliers and noise. We assume the following point cloud formation model:
(1) 
where is the observed noisy point cloud, are perfect surface samples (i.e., lying on the scanned surface ), is additive noise, and is the set of outlier points. We do not make any assumptions about the noise model or the outlier model . The goal of our work is to take the lowquality point cloud as input, and output a higher quality point cloud closer to , that is better suited for further processing. We refer to this process as cleaning. We split the cleaning into two steps: first we remove outliers, followed by an esimation of perpoint displacement vectors that denoise the remaining points:
(2) 
where is the output point cloud, are the displacement vectors and the outliers estimated by our method. We first discuss our design choices regarding the desirable properties of the resulting point cloud and then how we achieve them.
Approach.
Traditional statistical scan cleaning approaches typically make assumptions about the scanned surfaces or the noise model, which need to be manually tuned by the user to fit a given setting. This precludes the use of these methods by nonexpert users or in casual settings. One desirable property of any cleaning approach is therefore robustness to a wide range of conditions without the need for manual parameter tuning. Recently, deep learning approaches applied to point clouds [Qi et al., 2017a, Qi et al., 2017b, Wang et al., 2018a, Guerrero et al., 2018] have shown a remarkable increase in robustness compared to earlier handcrafted approaches. Most of these methods perform a global analysis of the point cloud and produce output that depends on the whole point cloud. This is necessary for global properties such as the semantic class, but is less suited for tasks that only depend on local neighborhoods; processing the entire point cloud simultaneously is a more challenging problem, since the network needs to handle a much larger variety of shapes compared to working with small local patches, requiring more training shapes and more network capacity. Additionally, processing dense point clouds becomes more difficult, due to high memory complexity. In settings such as ours, local methods such as PCPNet [Guerrero et al., 2018] perform better. Both steps of our approach are based on the network architecture described in this method, due to its relative simplicity and competitive performance. We adapt this architecture to our setting (Section 4) and train it to perform outlier classification and denoising.
While our cleaning task is mainly a local problem, the estimated displacement vectors need to be consistent across neighborhoods in order to achieve a smooth surface. With a local approach such as PCPNet, each local estimate is computed separately based on a different local patch. The difference in local neighborhoods causes inconsistencies between neighboring estimates that can be seen as residual noise in the result (see Figure 2). We therefore need a method to coordinate neighboring results. We observed that the amount of difference in local neighborhoods between neighboring estimates correlates with the noise model. Thus, the resulting residual noise has a similar noise model as the original noise, but with a smaller magnitude. This means we can iterate our network on the residual noise to keep improving our estimates. See Figure 3 for an overview of the full denoising approach. We will provide extensive experiments with different numbers of denoising iterations in Section 6.
Desirable properties of a point cloud.
The two stages (i.e., outlier classification and denoising) of our method use different loss functions. The properties of our denoised point cloud are largely determined by these loss functions. Thus, we need to design them such that their optimium is a point cloud that has all desirable properties. We identify two key desirable properties: First, all points should be as close as possible to the original scanned surface. Second, the points should be distributed as regularly as possible on the surface. Note that we do not want the denoised points to exactly undo the additive noise and approximate the original perfect surface samples, since the component of the additive noise that is tangent to the surface cannot be recovered from the noisy point cloud. Section 5 describes our loss functions, and in Section 6, we compare several alternative loss functions.
4 Cleaning Model
As mentioned above, our goal is to take a noisy point cloud and produce a cleaned point cloud that is closer to the unknown surface that produced the noisy samples. We treat denoising as a local problem: the result for each point only depends on a local neighborhood of radius around the point. Focusing on local neighborhoods allows us to handle dense point clouds without losing local detail. Increasing the locality (or scale) radius provides more information about the point cloud, at the cost of reducing the capacity available for local details. Unlike traditional analytic denoising approaches, a single neighborhood setting is robust to a wide range of noise settings, as we will demonstrate in Section 6. In all of our experiments we set to of the point cloud’s bounding box diagonal.
We assume the point cloud formation model described in Equation (1), i.e., the noisy point cloud consists of surface samples with added noise and outliers. We then proceed in two stages: first, we train a nonlinear function that removes outliers:
where is the estimated probability that point is an outlier. We add a point to the set of estimated outliers if . After removing the outliers, we obtain the point cloud . We proceed by defining a function that estimates displacements for these remaining points to move them closer to the unknown surface:
The final denoised points are obtained by adding the estimated displacements to the remaining noisy points: . Both and are modeled as deep neural networks with a PCPNetbased architecture. We next provide a short overview of PCPNet before describing our modifications.
A major challenge when applying deep learning methods directly to point clouds is achieving invariance to the permutation of the points: all permutations should produce the same result. Training a network to learn this invariance is difficult due to the exponential number of such permutations. As a solution to this problem, PointNet [Qi et al., 2017a] proposes a network architecture that is orderinvariant by design. However, PointNet is a global method, processing the whole point cloud in one forward iteration of the network. This results in a degraded performance for shape details. PCPNet [Guerrero et al., 2018] was proposed as a local variant of PointNet that is applied to local patches, gives better results for shape details, and is applicable to dense point clouds, possibly containing millions of points. We base our denoising architecture on PCPNet.
Creating a local patch.
Given a point cloud , the local patch contains all the points within the constant radius inside a ball centered around . Using this patch as input, we want to compute the outlier probability and a displacement vector , for the remaining nonoutlier points. We first normalize this patch by centering it and scaling it to unit size. The PCPNet architecture requires patches to have a fixed number of points; like in the original paper, we pad patches with too few points with zeros and take a random subset from patches with too many points. Intuitively, this step, makes the network more robust to additional points.
Network architecture.
An overview of our network architecture is shown in Figure 3. Given the normalized local patch , the network first applies a spatial transformer network [Jaderberg et al., 2015] that is constrained to rotations, called a quaternion spatial transformer network (QSTN). This is a small subnetwork that learns to rotate the patch to a canonical orientation (note that this estimation implicitly learns to be robust to outliers and noise, similar to robust statistical estimation). At the end of the pipeline, the final estimated displacement vectors are rotated back from the canonical orientation to world space. The remainder of the network is divided into three main parts:

a feature extractor that is applied to each point in the patch separately,

a symmetric operation that combines the computed features for each point into an orderinvariant feature vector for the patch, and

a regressor that estimates the desired properties and from the feature vector of the patch.
Following the original design of PointNet [Qi et al., 2017a], the feature extractor is implemented with a multilayer perceptron that is applied to each point separately, but shares weights between points. Computing the features separately for each point ensures that they are invariant to the ordering of the points. The feature extractor also applies an additional spatial transformer network to intermediate point features.
In our implementation, we add skip connections to the multilayer perceptrons, similar to ResNet blocks. Empirically, we found this to help with gradient propagation and improve training performance. The regressor is also implemented with a multilayer perceptron. Similar to the feature extractor, we add skip connections to help gradient propagation and improve training performance. We use the same network width as in the original PCPNet (please refer to the original paper for details). However, the network is two times deeper as we replace the original layers with two layers ResBlocks. This architecture is used to compute both outlier indicators and displacement vectors. We change the number of channels of the last regressor layer to fit the size of the desired output ( for outlier indicators and for displacement vectors).
Importantly, for a each point in the point cloud, we compute its local neighborhood and only estimate the outlier probability and displacement vector for the center point , i.e., we do not estimate outlier probabilities or displacement vectors for other points in the patch. Thus, each point in the original point cloud is processed independently by considering its own neighborhood and indirectly gets coupled by the iterative cleaning, as described next.
Iterative cleaning.
At test time, after applying the displacement vectors computed from a single iteration of the architecture, we are left with residual noise. The residual error vectors from denoised points to the target surface that are introduced by our method do not vary smoothly over the denoised points. Empirically, we found that this residual noise has a similar type, but a lower magnitude than the original noisy points. Intuitively, this can be explained by looking at the content of input patches for points that are neighbors in the denoised point cloud. As shown in Figure 2, input patches that are far apart have different content, resulting in different network predictions, while patches that are close together have similar content, and similar predictions. The distance of these input patches correlates with the noise model and the noise magnitude, therefore the network predictions, and the denoised points, are likely to have noise of a similar type, but a lower magnitude than the original noisy points. This allows us to iterate our denoising approach to continue improving the denoised points.
In practice, we observed shrinking of the point cloud after several iterations. To counteract this shrinking, we apply an inflation step after each iteration, inspired by Taubin smoothing [Taubin, 1995]:
(3) 
where are the corrected displacements vectors and are the nearest neighbours of point , we set . Note that this step approximately removes the lowfrequency component from the estimated displacements.
5 Training Setup
To train the denoising model, we use a dataset of paired noisy point clouds and corresponding clean ground truth point clouds. We do not need to know the exact correspondences of points in a pair, but we assume we do know the ground truth outlier label for each noisy point. Using a point cloud as ground truth instead of a surface description makes it easier to obtain training data. For example, a ground truth point cloud can be obtained from a higherquality scan of the same scene the noisy point cloud was obtained from. Since we work with local patches instead of entire point clouds, we can train with relatively few shapes. To handle different noise magnitudes, and to enable our iterative denoising approach, we train with multiple noise levels. This includes several training examples with zero noise magnitude, which trains the network to preserve the shape of point clouds without noise.
Loss function.
Choosing a good loss function is critical, as this choice has direct impact on the properties of the cleaned point clouds. For the outlier removal phase, we use the distance between the estimated outlier labels and the ground truth outlier labels:
(4) 
where is the estimated outlier probability and is the ground truth label. We also experimented with the binary crossentropy loss, but found the loss to perform better, in practice.
In the denoising setting, designing the loss function is less straightforward. Two properties we would like our denoised point clouds to have are proximity of the points to the scanned surface, and a regular distribution of the points over the surface. Assuming the ground truth point cloud has both of these properties, a straightforward choice for the loss would be the distance between the cleaned and the ground truth point cloud:
(5) 
where and are corresponding cleaned and ground truth points in a patch. Note that, for simplicity of notation, we have unrolled the displacement vector expressions directly in terms of point coordinates. However, this assumes knowledge of a pointwise correspondence between the point clouds; and even if the correspondence is known, we can in general not recover the component of the additive noise that is tangent to the surface. The minimizer of this loss is therefore an average between all potential candidates the noisy point may have originated from. This average will in general not lie on the surface, and lead to poor overall performance. Figure 4, top, illustrates this baseline loss. Fortunately, we do not need to exactly undo the additive noise. There is a large space of possible point clouds that satisfy the desired properties to the same degree as the ground truth point cloud, or even more so.
We propose a main loss function and an alternative with a slightly inferior performance, but simpler and more efficient formulation. The main loss function has one term for each of the two properties we would like to achieve: Proximity to the surface can be approximated as the distance of each denoised point to its nearest neighbour in the ground truth point cloud:
(6) 
For efficiency, we restrict the nearest neighbor search to the local patch of ground truth points centered at . Originally, we experimented with only this loss function, but noticed a filament structures forming on the surface after several denoising iterations, as shown in Figure 5. Since the points are only constrained to lie on the surface, there are multiple displacement vectors that bring them equally close to the surface. In multiple iterations, the points drift tangent to the surface, forming clusters. To achieve a more regular distribution on the surface, we introduce a regularization term:
(7) 
By minimizing this term, we minimize the squared distance to the farthest point in the local patch . Intuitively, this keeps the cleaned point centered in the patch and discourages a drift of the point tangent to the surface. Assuming the noisy point clouds are approximately regularly distributed, this results in a regular distribution of the cleaned points. With this term we want to avoid the excessive clustering of points (for example, into filament structures), but do not aim for perfect regularity. The full loss function is a weighted combination of the two loss terms:
(8) 
Since the second term can be seen as a regularization, we set to in our experiments.
Importantly, the loss defined in Eq. (8) depends on the current point cloud, so that the point searches in Equations (6) and (7) need to be updated in every training epoch. Alternatively, these target points can be fixed. Thus, our alternative loss function uses an explicit ground truth for the cleaned point that can be precomputed:
(9) 
where is the closest point to the initial noisy point (before denoising) in the ground truth point set . Figure 4, bottom, illustrates this loss. Since both and the ground truth point cloud are constant during training, this mapping can be precomputed, making this loss function more efficient and easier to implement. Additionally, the fixed target prevents the points from drifting tangent to the surface. However, this loss constrains the network more than and we observed a slightly lower performance.
For the outlier removal network we use a learning rate of and uniform Kaiming initialization [He et al., 2015] of the network weights. When training the denoising network, we observed that network weights converge to relatively small values. To help convergence, we lower the intial values of the weights to uniform random values in and decrease the learning rate to . This improves convergence speed for the denoising network and lowers the converged error.
6 Results
Dataset.
Our complete dataset contains 28 different shapes. From the original triangle meshes, we sample 100K points uniformly at random on the surface, to generate a clean point cloud. From each clean point cloud we generate noisy point clouds by adding Gaussian noise with a standard deviation of 0.25%, 0.5%, 1%, 1.5% and 2.5% of the original shape bounding box diagonal.
We separate the dataset into a training and test sets of similar complexity. We use 18 shapes for training and 10 shapes for testing while keeping only noise levels of 0.05%, 1%, 2.5% and clean shapes in the testset. In total, our training set contains 6 variations (including noise or clean datasets) of each shape so 108 point clouds and our test set contains 4 variations of each shape so 40 point clouds.
We use the same training and test shapes for outlier removal. For each shape, the training set contains one clean point cloud and 3 noisy point clouds with Gaussian noise having standard deviation of 0.5%, 1% and 1.5% of the bounding box diagonal. Each point cloud has 140K points. To generate outliers, we added Gaussian noise with standard deviation of 20% of the bounding box diagonal of the shape to a random subset of points, given a density target. Only the outliers that are farther from the surface than the standard deviation of the noise distribution of the point cloud are selected. The training set contains point clouds with 10%, 20%, 40%, 60%, 80% and 90% outlier densities. In total, the training set contains 432 example shapes, arising from 6 outlier densities and 4 levels of noise for each of the 18 training shapes. We built our test set using the same shapes as in the test set used for denoising. Namely, it contains clean point clouds and 5 noisy point clouds per shape to which we added 40K outlier points. Two different models of outliers are used in the test set. The first one adds Gaussian noise with standard deviation equal to 20% of the shape bounding box to the clean points on the surface. The second generates uniform points inside the bounding box of the shape scaled by additional 10%.
Finally, we also evaluated our method on point clouds, obtained by real acquisition devices, for which no ground truth information is available, but for which we provide a qualitative comparison with respect to several baselines below.
Evaluation metric.
The evaluation metric should be sensitive to the desired properties of the point cloud described earlier: Point clouds should be close to the surface and have an approximately regular distribution. If we assume the ground truth point clouds have a regular distribution, this variant of the Chamfer distance [Fan et al., 2017, Achlioptas et al., 2018] measures both of these properties:
where and are the cardinalities of and , respectively. Intuitively, the first term measures an approximate distance from each cleaned point to the target surface, and the second term rewards an even coverage of the target surface and penalizes gaps.
Denoising.
We evaluated our method on different noise levels and compared the results with the stateoftheart techniques used for point cloud denoising.
We first consider a qualitative evaluation of our results by showing the denoised point clouds for four different input noisy point clouds (Icosahedron, Star smooth, Netsuke, Happy) with two different noise intensities 1% and 2.5% of the original shape bounding box diagonal. To better visualize the results, we color the distances from the denoised points to the ground truth surface as shown in Figure 6. In the same figure, we can also compare the performance of our method to other successful algorithms. We evaluated against four other methods: (i) Dynamic Graph CNN (DGCNN) [Wang et al., 2018b]; (ii) Bilateral filtering for Point Clouds [Digne and De Franchis, 2017]; (iii) Edgeaware point resampling [Huang et al., 2013]; and (iv) polynomial fitting with osculating jets [Cazals and Pouget, 2005, Cazals and Pouget, 2007]. It is important to note that all these methods need some parameters to be tuned such as neighborhood size while our method is fully automated. When applicable, we manually adjusted parameters to give these methods best chance. Also, in some algorithms, we allowed multiple parameter settings (small, medium, large) to handle different levels of noise. Dynamic Graph CNNs were not designed for local operations, such as denoising. We modify the segmentation variant of this method to output a displacement vector per point instead of class probabilities. For the loss, the displacements are added to the original points and the result is compared to the target point cloud using the same Chamfer used as error metric in our evaluation. Since the whole point cloud is processed in a single go, we need to heavily subsample our dense point clouds before using them as input.
As can be seen in Figure 6, PointCleanNet produces the best results against all the other methods, resulting in smaller error distance, while also preserving most of the details on the surface. Note that we restrict DGCNN to a single iteration as we found the result set to diverge over iterations.
We can also observe that both the bilateral filtering approach [Digne and De Franchis, 2017] and DGCNN [Wang et al., 2018b] perform poorly on denoising the input point clouds. Both approaches produce denoised points that maintain a large distance from the ground truth surface. Moreover, edgeaware filtering [Huang et al., 2013] struggles to remove noise from detailed areas of the noisy point cloud while obtaining good results in preserving sharp features, as can be seen, for example in the case of the denoised Icosahedron.
We also present quantitative results computing the root mean square distances between :

the denoised points and their closest points on the surface.

the points on the ground truth surface and the respective closest denoised points.
We then combine these errors as we previously explained in the Evaluation metric to produce the Chamfer distance. The quantitative results are also compared to the same previously described algorithms. These results are shown in Figure 7 left (without outlier removal). We can observe that PointCleanNet performs noticeable better under mid to high noise level and using multiple iterations compared to all the other methods. Moreover, to show the performance of our method in preserving details in high and low use curvature regions we divide the point cloud into bins corresponding to four curvature quantities (we estimate curvature on the original clean pointsets using jet fitting and then transfer the low/mid/high curvature classifications to the noisy point clouds). Note that while our method remains stable across different curvature regions, other techniques quickly degrade in more highly curved areas.
Outlier removal.
The last two plots in Figure 7 compare our approach to jet fitting and edge aware techniques with both outlier removal and denoising on the test set. In this experiment we first removed outliers using an outlier classification technique and then denoised the point clouds from our testset. We show the results for different noise levels from zero to 2.5% of the shape bounding box diagonal. We make two observations: first, PointCleanNet outperforms edge aware and jet fitting techniques with outlier removal on medium to large noise levels; and second, on smaller noise levels, the models still outperforms most of the different tuning variations of the related techniques. Recall that our model does not need to tuning parameters from the user.
Figure 9 also shows qualitative results for outlier removal on our test set compared to the related techniques mentioned before. We observe that edgeaware filtering [Huang et al., 2013] performs worse around highly detailed regions and edges while jetfitting [Cazals and Pouget, 2005] does not manage to clean the remaining outliers at scattered points. The result highlights the consistent performance of PointCleanNet across different shapes and detailed areas, contrary to the other methods which produce less consistent distances to the underlying ground truth shapes.
Comparison to the alternative loss.
As shown in Figure 11, our alternative loss performs slightly worse than our main loss. However, it is more efficient and easier to implement, so the choice of loss function depends on the setting.
Directional noise.
We also evaluated PointCleanNet on a synthetic dataset simulating 3D data acquisition via depth cameras. To do so we created a dataset with structured noise levels to simulates depth uncertainty of depth reconstructions. Specifically, we added noise using an anisotropic Gaussian distribution with constant covariance matrix aligned along the scanning direction. The results can be observed in Figure 10. Note that our network was not retrained for this specific model.
Generalization to realworld data.
Figure 1 and the first two results on Figure 8 (left) show the result of our approach on real data obtained with plane swift algorithm: an imagebased 3D reconstruction technique, and released publicly [Wolff et al., 2016]. Statue, torch, and scarecrow input point clouds each contain 1.4M points. Since in this case no ground truth is available, we only show the qualitative results obtained using our method. Note that although trained on an entirely different dataset our approach still produces high quality results on this challenging realworld data. The next three shapes in Figure 8 show results on other external raw point clouds, while the last one shows a shape with sharp edges.
7 Conclusion
We presented PointCleanNet, a learningbased framework that consumes noisy point clouds and outputs clean ones by removing outliers and denoising the remaining points with a displacement back to the underlying (unknown) scanned surface. One key advantage of the proposed setup is the simplicity of using the framework at test time as it neither requires additional parameters nor noise/device specifications from the user. In our extensive evaluation, we demonstrated that PointCleanNet consistently outperforms stateoftheart denoising approaches (that were provided with manually tuned parameters) on a range of models under various noise settings. Given its universality and ease of use, PointCleanNet can be readily integrated with any geometry processing workflow that consumes raw point clouds.
While we presented a first learning architecture to clean raw point clouds, several future directions remain to be explored: (i) First, as a simple extension, we would like to combine the outlier removal and denoising into a single network, rather than two separate parts. (ii) Further, to increase efficiency, we would like to investigate how to perform denoising at a patchlevel rather than perpoint level. This would require designing a scheme to combine denoising results from overlapping patches. (iii) Although PointCleanNet already produces a uniform point distribution on the underlying surface if the noisy points are uniformly distributed, we would like to investigate the effect of a specific uniformity term in the loss function (similar to [Yin et al., 2018]) to also produce a uniform distribution for nonuniform noisy points. The challenge, however, would be to restrain the points to remain on the surface and not deviate off the underlying surface. (iv) Additionally, it would be interesting to investigate how to allow the network to upsample points, especially in regions with insufficient number of points, or to combine it with existing upsampling methods such as [Yu et al., 2018] This would be akin to the ‘point spray’ function in more traditional point cloud processing toolboxes. (v) Finally, we would like to investigate how to train a point cloud cleanup network without requiring paired noisyclean point clouds in the training set. If successful, this will enable directly handling noisy point clouds from arbitrary scanning setups without requiring explicit noise model or examples of denoised point clouds at training time. We plan to draw inspiration from related unpaired imagetranslation tasks where generative adversarial setups that have been successfully used.
References
 [Achlioptas et al., 2018] Achlioptas, P., Diamanti, O., Mitliagkas, I., and Guibas, L. (2018). Learning representations and generative models for 3d point clouds. In ICLR.
 [Aggarwal, 2015] Aggarwal, C. C. (2015). Outlier analysis. In Data mining, pages 237–263. Springer.
 [Alexa et al., 2003] Alexa, M., Behr, J., CohenOr, D., Fleishman, S., Levin, D., and Silva, C. T. (2003). Computing and rendering point set surfaces. IEEE Transactions on visualization and computer graphics, 9(1):3–15.
 [Avron et al., 2010] Avron, H., Sharf, A., Greif, C., and CohenOr, D. (2010). sparse reconstruction of sharp point set surfaces. ACM Transactions on Graphics (TOG), 29(5):135.
 [BenGal, 2005] BenGal, I. (2005). Outlier detection. In Data mining and knowledge discovery handbook, pages 131–146. Springer.
 [Benouini et al., 2018] Benouini, R., Batioua, I., Zenkouar, K., Najah, S., and Qjidaa, H. (2018). Efficient 3d object classification by using direct krawtchouk moment invariants. Multimedia Tools and Applications, pages 1–26.
 [Bronstein et al., 2017] Bronstein, M. M., Bruna, J., LeCun, Y., Szlam, A., and Vandergheynst, P. (2017). Geometric deep learning: going beyond euclidean data. IEEE Signal Processing Magazine, 34(4):18–42.
 [Buades et al., 2005] Buades, A., Coll, B., and Morel, J.M. (2005). A nonlocal algorithm for image denoising. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, volume 2, pages 60–65. IEEE.
 [Cazals and Pouget, 2005] Cazals, F. and Pouget, M. (2005). Estimating differential quantities using polynomial fitting of osculating jets. Computer Aided Geometric Design, 22(2):121–146.
 [Cazals and Pouget, 2007] Cazals, F. and Pouget, M. (2007). Jet_fitting_3: A generic C++ package for estimating the differential properties on sampled surfaces via polynomial fitting. PhD thesis, INRIA.
 [Chambolle et al., 2010] Chambolle, A., Caselles, V., Cremers, D., Novaga, M., and Pock, T. (2010). An introduction to total variation for image analysis. Theoretical foundations and numerical methods for sparse recovery, 9(263340):227.
 [Chatterjee and Milanfar, 2010] Chatterjee, P. and Milanfar, P. (2010). Is denoising dead? IEEE Transactions on Image Processing, 19(4):895–911.
 [Chazal et al., 2011] Chazal, F., CohenSteiner, D., and Mérigot, Q. (2011). Geometric inference for probability measures. Foundations of Computational Mathematics, 11(6):733–751.
 [Deschaud and Goulette, 2010] Deschaud, J.E. and Goulette, F. (2010). Point cloud non local denoising using local surface descriptor similarity. IAPRS, 38(3A):109–114.
 [Digne, 2012] Digne, J. (2012). Similarity based filtering of point clouds. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2012 IEEE Computer Society Conference on, pages 73–79. IEEE.
 [Digne et al., 2014] Digne, J., Chaine, R., and Valette, S. (2014). Selfsimilarity for accurate compression of point sampled surfaces. In Computer Graphics Forum, volume 33, pages 155–164. Wiley Online Library.
 [Digne and De Franchis, 2017] Digne, J. and De Franchis, C. (2017). The bilateral filter for point clouds. Image Processing On Line, 7:278–287.
 [Digne et al., 2018] Digne, J., Valette, S., and Chaine, R. (2018). Sparse geometric representation through local shape probing. IEEE Transactions on Visualization and Computer Graphics, 24(7):2238–2250.
 [Elad, 2010] Elad, M. (2010). From exact to approximate solutions. In Sparse and Redundant Representations, pages 79–109. Springer.
 [Elad and Aharon, 2006] Elad, M. and Aharon, M. (2006). Image denoising via sparse and redundant representations over learned dictionaries. IEEE Transactions on Image Processing, 15(12):3736–3745.
 [Fan et al., 2017] Fan, H., Su, H., and Guibas, L. J. (2017). A point set generation network for 3d object reconstruction from a single image. In CVPR, volume 2, page 6.
 [Fleishman et al., 2005] Fleishman, S., CohenOr, D., and Silva, C. T. (2005). Robust moving leastsquares fitting with sharp features. ACM transactions on graphics (TOG), 24(3):544–552.
 [Giraudot et al., 2013] Giraudot, S., CohenSteiner, D., and Alliez, P. (2013). Noiseadaptive shape reconstruction from raw point sets. In Proceedings of the Eleventh Eurographics/ACMSIGGRAPH Symposium on Geometry Processing, pages 229–238. Eurographics Association.
 [Gross and Pfister, 2011] Gross, M. and Pfister, H. (2011). Pointbased graphics. Elsevier.
 [Guerrero et al., 2018] Guerrero, P., Kleiman, Y., Ovsjanikov, M., and Mitra, N. J. (2018). PCPNet: Learning local shape properties from raw point clouds. Computer Graphics Forum, 37(2):75–85.
 [Guibas et al., 2013] Guibas, L., Morozov, D., and Mérigot, Q. (2013). Witnessed kdistance. Discrete & Computational Geometry, 49(1):22–45.
 [Han et al., 2017] Han, X.F., Jin, J. S., Wang, M.J., Jiang, W., Gao, L., and Xiao, L. (2017). A review of algorithms for filtering the 3d point cloud. Signal Processing: Image Communication, 57:103–112.
 [He et al., 2015] He, K., Zhang, X., Ren, S., and Sun, J. (2015). Delving deep into rectifiers: Surpassing humanlevel performance on imagenet classification. In Proceedings of the IEEE international conference on computer vision, pages 1026–1034.
 [Huang et al., 2013] Huang, H., Wu, S., Gong, M., CohenOr, D., Ascher, U., and Zhang, H. R. (2013). Edgeaware point set resampling. ACM Transactions on Graphics (TOG), 32(1):9.
 [Jaderberg et al., 2015] Jaderberg, M., Simonyan, K., Zisserman, A., et al. (2015). Spatial transformer networks. In Advances in neural information processing systems, pages 2017–2025.
 [Jin et al., 2017] Jin, K. H., McCann, M. T., Froustey, E., and Unser, M. (2017). Deep convolutional neural network for inverse problems in imaging. IEEE Transactions on Image Processing, 26(9):4509–4522.
 [Lescoat et al., 2018] Lescoat, T., Ovsjanikov, M., Memari, P., Thiery, J.M., and Boubekeur, T. (2018). A survey on datadriven dictionarybased methods for 3d modeling. In Computer Graphics Forum, volume 37, pages 577–601. Wiley Online Library.
 [Levin and Nadler, 2011] Levin, A. and Nadler, B. (2011). Natural image denoising: Optimality and inherent bounds. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pages 2833–2840. IEEE.
 [Mairal, 2010] Mairal, J. (2010). Sparse coding for machine learning, image processing and computer vision. PhD thesis, Cachan, Ecole normale supérieure.
 [Masci et al., 2015] Masci, J., Boscaini, D., Bronstein, M., and Vandergheynst, P. (2015). Geodesic convolutional neural networks on riemannian manifolds. In Proceedings of the IEEE international conference on computer vision workshops, pages 37–45.
 [Mattei and Castrodad, 2017] Mattei, E. and Castrodad, A. (2017). Point cloud denoising via moving rpca. In Computer Graphics Forum, volume 36, pages 123–137. Wiley Online Library.
 [Mederos et al., 2003] Mederos, B., Velho, L., and de Figueiredo, L. H. (2003). Point cloud denoising. In Proceeding of SIAM Conference on Geometric Design and Computing, pages 1–11. Citeseer.
 [Öztireli et al., 2009] Öztireli, A. C., Guennebaud, G., and Gross, M. (2009). Feature preserving point set surfaces based on nonlinear kernel regression. In Computer Graphics Forum, volume 28, pages 493–501.
 [Pincus, 1995] Pincus, R. (1995). Barnett, v., and lewis t.: Outliers in statistical data. j. wiley & sons 1994, xvii. 582 pp.,£ 49.95. Biometrical Journal, 37(2):256–256.
 [Qi et al., 2017a] Qi, C. R., Su, H., Mo, K., and Guibas, L. J. (2017a). Pointnet: Deep learning on point sets for 3d classification and segmentation. CVPR, 1(2):4.
 [Qi et al., 2017b] Qi, C. R., Yi, L., Su, H., and Guibas, L. J. (2017b). Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In Advances in Neural Information Processing Systems, pages 5099–5108.
 [Rousseeuw and Hubert, 2011] Rousseeuw, P. J. and Hubert, M. (2011). Robust statistics for outlier detection. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 1(1):73–79.
 [Roveri et al., 2018] Roveri, R., Öztireli, A. C., Pandele, I., and Gross, M. (2018). Pointpronets: Consolidation of point clouds with convolutional neural networks. Computer Graphics Forum, 37(2):87–99.
 [Sun et al., 2015] Sun, Y., Schaefer, S., and Wang, W. (2015). Denoising point sets via l0 minimization. Computer Aided Geometric Design, 35:2–15.
 [Taubin, 1995] Taubin, G. (1995). Curve and surface smoothing without shrinkage. In Computer Vision, 1995. Proceedings., Fifth International Conference on, pages 852–857. IEEE.
 [Wang et al., 2016] Wang, P.S., Liu, Y., and Tong, X. (2016). Mesh denoising via cascaded normal regression. ACM Transactions on Graphics (TOG), 35(6):232.
 [Wang et al., 2018a] Wang, Y., Sun, Y., Liu, Z., Sarma, S. E., Bronstein, M. M., and Solomon, J. M. (2018a). Dynamic graph cnn for learning on point clouds. arXiv preprint arXiv:1801.07829.
 [Wang et al., 2018b] Wang, Y., Sun, Y., Liu, Z., Sarma, S. E., Bronstein, M. M., and Solomon, J. M. (2018b). Dynamic graph cnn for learning on point clouds. arXiv preprint arXiv:1801.07829.
 [Wei et al., 2016] Wei, L., Huang, Q., Ceylan, D., Vouga, E., and Li, H. (2016). Dense human body correspondences using convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1544–1553.
 [Wolff et al., 2016] Wolff, K., Kim, C., Zimmer, H., Schroers, C., Botsch, M., SorkineHornung, O., and SorkineHornung, A. (2016). Point cloud noise and outlier removal for imagebased 3d reconstruction. In 3D Vision (3DV), 2016 Fourth International Conference on, pages 118–127. IEEE.
 [Yin et al., 2018] Yin, K., Huang, H., CohenOr, D., and Zhang, H. (2018). P2pnet: Bidirectional point displacement net for shape transform. ACM Trans. Graph., 37(4):152:1–152:13.
 [Yoon et al., 2016] Yoon, Y.J., Lelidis, A., Öztireli, A. C., Hwang, J.M., Gross, M., and Choi, S.M. (2016). Geometry representations with unsupervised feature learning. In Big Data and Smart Computing (BigComp), 2016 International Conference on, pages 137–142. IEEE.
 [Yu et al., 2018] Yu, L., Li, X., Fu, C.W., CohenOr, D., and Heng, P.A. (2018). Punet: Point cloud upsampling network. In Proc. CVPR.
 [Zeng et al., 2018] Zeng, J., Cheung, G., Ng, M., Pang, J., and Yang, C. (2018). 3d point cloud denoising using graph laplacian regularization of a low dimensional manifold model. arXiv preprint arXiv:1803.07252.
 [Zhang et al., 2017a] Zhang, K., Zuo, W., Chen, Y., Meng, D., and Zhang, L. (2017a). Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising. IEEE Transactions on Image Processing, 26(7):3142–3155.
 [Zhang et al., 2017b] Zhang, K., Zuo, W., Gu, S., and Zhang, L. (2017b). Learning deep cnn denoiser prior for image restoration. In IEEE Conference on Computer Vision and Pattern Recognition, volume 2.
 [Zheng et al., 2010] Zheng, Q., Sharf, A., Wan, G., Li, Y., Mitra, N. J., CohenOr, D., and Chen, B. (2010). Nonlocal scan consolidation for 3d urban scenes. ACM Trans. Graph., 29(4):94–1.