3D Dynamic Point Cloud Denoising viaSpatio-temporal Graph Modeling

# 3D Dynamic Point Cloud Denoising via Spatio-temporal Graph Modeling

Qianjiang Hu Zehua Wang Peking University Wei Hu Peking University Xiang Gao Peking University  and  Zongming Guo Peking University
###### Abstract.

The prevalence of accessible depth sensing and 3D laser scanning techniques has enabled the convenient acquisition of 3D dynamic point clouds, which provide efficient representation of arbitrarily-shaped objects in motion. Nevertheless, dynamic point clouds are often perturbed by noise due to hardware, software or other causes. While many methods have been proposed for the denoising of static point clouds, dynamic point cloud denoising has not been studied in the literature yet. Hence, we address this problem based on the proposed spatio-temporal graph modeling, exploiting both the intra-frame similarity and inter-frame consistency. Specifically, we first represent a point cloud sequence on graphs and model it via spatio-temporal Gaussian Markov Random Fields on defined patches. Then for each target patch, we pose a Maximum a Posteriori estimation, and propose the corresponding likelihood and prior functions via spectral graph theory, leveraging its similar patches within the same frame and corresponding patch in the previous frame. This leads to our problem formulation, which jointly optimizes the underlying dynamic point cloud and spatio-temporal graph. Finally, we propose an efficient algorithm for patch construction, similar/corresponding patch search, intra- and inter-frame graph construction, and the optimization of our problem formulation via alternating minimization. Experimental results show that the proposed method outperforms frame-by-frame denoising from state-of-the-art static point cloud denoising approaches.

Dynamic point clouds , denoising, spatio-temporal GMRF modeling, spectral graph theory
journalyear: 2019copyright: acmlicensedconference: MM’19: ACM International Conference on Multimedia; October 21–25, 2019; Nice, Francebooktitle: Proceedings of MM’19: ACM International Conference on Multimedia (MM’19)price: 15.00doi: 10.1145/1122445.1122456isbn: 978-1-4503-9999-9/18/06copyright: noneccs: Computing methodologies Point-based modelsccs: Computing methodologies Maximum a posteriori modelingccs: Mathematics of computing Graph algorithms\acmSubmissionID

1069

## 1. Introduction

The maturity of depth sensing and 3D laser scanning techniques has enabled convenient acquisition of 3D dynamic point clouds, a natural representation for arbitrarily-shaped objects varying over time (Rusu and Cousins, 2011). A dynamic point cloud consists of a sequence of static point clouds, each of which is composed of a set of points defined on irregular grids, as shown in Fig. 1. Each point has geometry information (i.e., 3D coordinates) and possibly attribute information such as color. We focus on the geometry of point clouds in this paper due to its vital role. Because of the efficient representation, dynamic point clouds have been widely deployed in various fields, such as 3D immersive tele-presence, navigation for autonomous vehicles, gaming and animation (Tulvan et al., 2016).

Point clouds are often perturbed by noise, which comes from hardware, software or other causes. Hardware wise, noise occurs due to the inherent limitations of the acquisition equipment. Software wise, in the case of generating point clouds with existing algorithms, points may locate somewhere completely wrong due to imprecise triangulation (e.g., a false epipolar matching). The noise corruption directly affects the subsequent applications of dynamic point clouds.

However, the denoising of dynamic point clouds hasn’t been studied in the literature yet, while many approaches have been proposed for static point cloud denoising. Existing denoising methods for static point clouds mainly include moving least squares (MLS)-based methods, locally optimal projection (LOP)-based methods, sparsity-based methods, and non-local similarity-based methods. MLS-based methods (Alexa et al., 2003; Guennebaud and Gross, 2007; Oztireli et al., 2009) approximate a smooth surface for the input point clouds and project the points to the estimated surface. LOP-based methods (Huang et al., 2013; Hui et al., 2009; Lipman et al., 2007) also apply surface approximation but the operator is non-parametric. Sparsity-based methods (Avron et al., 2010; Mattei and Castrodad, 2017) assume sparse representation of point normals, and solve the global minimization problem to obtain the sparse reconstruction of the point normals. Non-local similarity-based methods (Dinesh et al., 2018; Zeng et al., 2018) exploit self-similarities among surface patches in a point cloud. Besides, several other approaches have been proposed for static point cloud denoising (Huang et al., 2009; Yan and Zhai, 2015; Rusu et al., 2008; Gao et al., 2018), in which the key idea is to detect noise in point clouds via certain characteristics and then delete them.

Whereas it is possible to apply existing static point cloud denoising methods to each frame of a dynamic point cloud sequence separately, the inter-frame correlation would be neglected, which may lead to inconsistent denoising results in the temporal domain. Hence, we propose joint denoising of dynamic point clouds by exploiting the inter-frame correlation, which not only enforces the temporal consistency but also provides additional information for denoising. Since point clouds are irregular, it is challenging to acquire the temporal correspondence between neighboring frames. We address this issue by representing dynamic point clouds naturally on graphs, where each vertex represents a point, each edge captures the relationship between neighboring points, and the corresponding graph signal refers to the coordinates of points. We then propose a graph-based method to search the temporal correspondence and estimate the underlying clean dynamic point cloud.

Specifically, since it is computationally inefficient to consume an entire frame of point cloud, we first divide each frame into overlapping patches. Each irregular patch is defined as a local point set consisting of a centering point and its -nearest neighbors. Then we propose a spatial-temporal model under Gaussian Markov Random Fields (GMRF) (Rue and Held, 2005), which play a crucial role in describing both the intra-frame and inter-frame correlations over patches. Next, we estimate the underlying current frame via Maximum a Posteriori (MAP) estimation, given the previous and current noisy frames. We propose the likelihood function and prior distribution, based on the GMRF modeling and graph-signal smoothness prior (Shuman et al., 2013). This leads to the proposed problem formulation of dynamic point cloud denoising, where the underlying frame and its graph representation (the graph Laplacian111In spectral graph theory (Chung, 1997), a graph Laplacian matrix is an algebraic representation of the connectivities of the corresponding graph, which will be introduced in Section 3. in particular) are jointly optimized.

Based on the above problem formulation, we propose an efficient algorithm to address the denoising problem of dynamic point clouds. For each target patch in the current frame, we first search for its similar patches in the same frame to exploit the intra-frame correlation, and search for its corresponding patch in the previous frame to explore the inter-frame correlation. Similar to (Zeng et al., 2018), the similarity metric between two patches depends on the distance from each point in the two patches to the tangent plane at each patch center of both patches. Based on the similar patches and corresponding patch, we address the problem formulation by designing an efficient alternating minimization algorithm to solve the underlying frame and graph Laplacian alternately. In particular, since the computational complexity of solving the graph Laplacian would be high and the numerical computation might be unstable, we propose to construct the intra-frame graph and inter-frame graph based on the patch similarity manually from each update of the underlying frame. Experimental results show that the proposed method outperforms separate denoising of each frame from state-of-the-art static point cloud denoising methods on five widely used dynamic point cloud sequences.

In summary, the main contributions of our work include:

• To the best of our knowledge, we are the first to address dynamic point cloud denoising problem in the literature. The key idea is to exploit the inter-frame correlation of irregular point clouds for the temporal consistency.

• We propose a spatial-temporal model of dynamic point clouds under GMRF, and derive the MAP estimation from graph-signal priors, which finally casts dynamic point cloud denoising as an optimization problem.

• We propose an efficient algorithm to solve the optimization problem. Experimental results validate the effectiveness of our method.

## 2. Related Work

To the best of our knowledge, there has been no research on dynamic point cloud denoising yet in the literature. Previous works on static point cloud denoising can be divided into four classes: moving least squares (MLS)-based methods, locally optimal projection (LOP)-based methods, sparsity-based methods, and non-local methods.

MLS-based methods. MLS-based methods aim to approximate a smooth surface from the input point cloud and minimize the geometric error of the approximation. Alexa et al. obtain a polynomial function on a local reference domain to best fit neighboring points in terms of MLS (Alexa et al., 2003). Other similar solutions are algebraic point set surfaces (APSS) (Guennebaud and Gross, 2007) and robust implicit MLS (RIMLS) (Oztireli et al., 2009). However, the results may be over-smoothing and may not perform well in terms of removing outliers.

LOP-based methods. LOP-based methods also apply surface approximation for denoising point clouds. But unlike MLS-based methods, the operator is non-parametric, thus it performs well in cases of ambiguous orientation. For example, Lipman et al. define a set of points that represent the estimated surface by minimizing the sum of Euclidean distances to the data points (Lipman et al., 2007). The two branches of (Lipman et al., 2007) are weighted LOP (WLOP) (Hui et al., 2009) and anisotropic WLOP (AWLOP) (Huang et al., 2013). (Hui et al., 2009) produces a set of denoised, outlier-free and more evenly distributed particles over the original dense point cloud to keep the sample distance of neighboring points. (Huang et al., 2013) modifies WLOP with an anisotropic weighting function so as to preserve sharp features better. However, LOP-based methods may also lead to over-smoothing results.

Sparsity-based methods. Sparsity-based methods are based on the theory of sparse representation of the point normals. With sparsity regularization, they solve a global minimization problem to obtain sparse reconstruction of the point normals. Then the positions of points are updated by solving another global minimization problem based on a local planar assumption, such as (Mattei and Castrodad, 2017) and (Avron et al., 2010). However, when locally high noise-to-signal ratios yield redundant features, these methods may not perform well and lead to over-smoothing or over-sharpening (Sun et al., 2015).

Non-local methods. Non-local methods exploit self-similarities among surface patches in a point cloud. These methods are inspired by non-local means (NLM) (Buades et al., 2005) and BM3D (Dabov et al., 2007) image denoising algorithms. For example, Digne et al. utilize a NLM algorithm to denoise static point clouds (Digne, 2012), while Rosman et al. implement a BM3D method to smooth point clouds (Rosman et al., 2013). Besides, Zeng et al. define the self-similarity among patches in point clouds formally as a low-dimensional manifold prior (Zeng et al., 2018). Dinesh et al. approximate a -nearest-neighbor graph of 3D points as a bipartite graph and then deploy graph total variation to the surface normals of neighboring 3D points as regularization (Dinesh et al., 2018). However, the computational complexity of the above methods is usually high.

Besides, deep learning has been recently deployed for static point cloud denoising (Almonacid et al., 2018). A CNN model is trained with a set of real and synthetic scans with clean and noisy areas, and then applied to perform denoising. However, finer geometric precision is unfeasible for now given the high computational complexity of the model.

## 3. Preliminaries

We represent dynamic point clouds on undirected graphs. An undirected graph is composed of a vertex set of cardinality , an edge set connecting vertices, and a weighted adjacency matrix . is a real and symmetric matrix, where is the weight assigned to the edge connecting vertices and . Edge weights often measure the similarity between connected vertices.

The graph Laplacian matrix is defined from the adjacency matrix. Among different variants of Laplacian matrices, the combinatorial graph Laplacian used in (Shen et al., 2010; Hu et al., 2015) is defined as , where is the degree matrix—a diagonal matrix where .

Graph signal refers to data that resides on the vertices of a graph. In our case, the coordinates of each point in the input dynamic point cloud are the graph signal. A graph signal defined on a graph is smooth with respect to the topology of if

 (1) ∑i∼jai,j(zi−zj)2<ϵ,

where is a small positive scalar, and denotes two vertices and are one-hop neighbors in the graph. In order to satisfy (1), and have to be similar for a large edge weight , and could be quite different for a small . Hence, (1) enforces to adapt to the topology of , which is thus coined graph-signal smoothness prior.

As (Spielman, 2004), (1) is concisely written as in the sequel. This term will be employed as the prior for the MAP estimation of dynamic point clouds.

## 4. Problem Formulation

In this section, we elaborate on the proposed problem formulation. We start from the modeling of a dynamic point cloud sequence via spatio-temporal GMRFs, and propose such modeling on patch basis. Then we pose a MAP estimation of the underlying dynamic point cloud, and come up with the likelihood function and prior distribution. Finally, we arrive at the problem formulation from the MAP estimation.

### 4.1. Spatial-Temporal Modeling

A dynamic point cloud sequence consists of frames of point clouds. The coordinates denote the position of each point in the point cloud at frame , in which represents the coordinates of the -th point at frame . Let denote the ground truth coordinates of the -th frame, and , denote the noise-corrupted coordinates of the -th and -th frame respectively. Then we formulate the dynamic point cloud denoising problem as

 (2) ^Ut=f(Ut−1,Ut)+Et,

where is a zero-mean signal-independent noise. For point clouds acquired from equipments, the noise distribution is related to the acquisition equipments. Several previous works (Nguyen et al., 2012; Sun et al., 2008) have shown through statistics that the noise in point clouds approximates Gaussian distribution for 3D scanning equipments such as Microsoft Kinect, 3D laser scanner, etc. As these are popular sensors, we assume the noise follows Gaussian distribution.

Spatio-temporal GMRF modeling. In particular, we model the relationship in consecutive frames of a dynamic point cloud via spatio-temporal GMRF models. A spatial GMRF is a restrictive multivariate Gaussian distribution that satisfies additional conditional independence assumptions. A graph is often used to represent the conditional independence assumption. Here is the formal definition:

Definition: A random vector is a GMRF with respect to a graph with mean and precision matrix , if its density has the form

 (3)

and

 (4) Qi,j≠0⟺{i,j}∈E,∀i≠j.

Spatio-temporal GMRF models are extensions of spatial GMRF models to account for additional temporal variation. In our case, we represent a dynamic point cloud of frames on a sequence of subgraphs. Each subgraph describes the intra-frame connectivities within each frame, and temporal connectivities exist between neighboring subgraphs to describe the inter-frame connectivities.

Patch representation. Further, as it is computationally expensive to consume an entire point cloud, we model both intra-frame and inter-frame dependencies on patch basis. Unlike images or videos defined on regular grids, point clouds reside on irregular domain with uncertain local neighborhood, thus the definition of a patch is nontrivial. We define a patch in the point cloud at frame as a local point set of points, consisting of a centering point and its -nearest neighbors in terms of Euclidean distance. Then the entire set of patches at frame is

 (5) Pt=StUt−Ct,

where is a sampling matrix to select points from point cloud so as to form patches of points each, and contains the coordinates of patch centers for each point.

Based on the patch representation, we model the intra-frame dependency by building graph connectivities among similar patches within a frame, and model the inter-frame dependency by constructing graph connectivities between corresponding patches over consecutive frames. The details of searching similar patches within a frame and corresponding patches between frames will be discussed in section 5.2.

### 4.2. MAP Estimation of Dynamic Point Clouds

Under the spatio-temporal GMRF modeling, we pose a MAP estimation for the underlying patches in the point cloud at frame : given the observed noisy previous frame and current noisy frame , find the most probable signal ,

 (6)

where is the likelihood function, and is the signal prior. Because are patches that cover the entire , Eq. (6) also gives the MAP estimation of :

 (7)

The proposed likelihood function. is the probability of obtaining the observed point clouds and given the desired current frame . We have

 (8) f(^Pt−1,^Pt∣Pt)=f(^Pt−1∣Pt,^Pt)f(^Pt∣Pt)=f(^Pt−1∣Pt)f(^Pt∣Pt),

where is equivalent to because we assume the noise of the -th frame and -th frame are independent.

For the second term in Eq. (8), according to the linear relationship of and as in Eq. (5) and assuming zero-mean Gaussian distribution for the noise, we have

 (9) argmaxUtf(^Pt∣Pt)=argmaxUtf(^Ut∣Ut)=argmaxUtf(^Ut−Ut∣Ut)=argmaxUtα1exp(−λ1∥∥^Ut−Ut∥∥22),

where is a normalization factor to keep the integral of the probability function equal to 1, and is a variance-related parameter.

For the first term in Eq. (8), since the variation between adjacent frames is often trivial, we assume the current frame is a perturbed version of the previous frame. In particular, we propose to adopt a weighting parameter to represent the perturbation at the -th patch, leading to

 (10) f(^Pt−1∣Pt)=α2exp(−λ2M∑i=1wi∥∥^Pt−1,i−Pt,i∥∥22),

where is a normalization factor, and is a variance-related parameter. In the proposed algorithm, is a variable depending on and , which describes the similarity between and .

The proposed prior distribution. Since follows GMRF modeling, assuming zero mean, we have its prior distribution from Eq. (3) as:

 (11) g(Pt)=βexp(−12P⊤tQtPt),

where is the precision matrix of the -th frame, and is a normalization factor.

However, it is challenging to estimate statistically from small amounts of data. Instead, as introduced in (Zhang et al., 2014), the precision matrix can be interpreted by the graph Laplacian, i.e., by a scalar . Hence, we replace in Eq. (11) by the graph Laplacian , leading to

 (12) g(Pt)=βexp(−12P⊤tLtPt).

### 4.3. Final Problem Formulation

Combining Eq. (7), Eq. (8), Eq. (9), Eq. (10), Eq. (12), we have

 (13)

Due to the dependency of and on , and are optimization variables as well as .

Taking logarithm of Eq. (LABEL:eq:final_1) and multiplying by , we arrive at the final problem formulation:

 (14) minUt,Lt,wiλ1||Ut−^Ut||22+λ2M∑i=1wi||Pt,i−^Pt−1,i||22+P⊤tLtPt,s.t.Pt=StUt−Ct.

As , and are optimization variables, Eq. (14) is nontrivial to solve. We develop an efficient algorithm to address this problem formulation in the next section.

## 5. The Proposed Algorithm

As demonstrated in Fig. 2, for a given dynamic point cloud, we perform denoising on each frame sequentially. The proposed algorithm consists of four major steps: 1) patch construction, in which we form overlapped patches from chosen patch centers; 2) similar/corresponding patch search, in which we search similar patches for each patch in the current frame, and search the corresponding patch in the previous frame; 3) graph construction, in which we build a spatio-temporal graph with intra-connectivities among similar patches and inter-connectivities among corresponding patches; 4) optimization, in which we solve the proposed problem formulation in Eq. (14) via alternating minimization, thus performing step 2-4 iteratively. Note that, the inter-frame reference is bypassed for denoising the first frame as there is no previous frame. We discuss the four steps separately in detail.

### 5.1. Patch Construction

As each patch is formed around a patch center, we first select points from as the patch centers, denoted as . In order to keep the patches distributed as uniformly as possible, we first choose a random point in as , and add a point which holds the farthest distance to the previous patch centers as the next patch center, until there are points in the set of patch centers. We then search the -nearest neighbors of each patch center in terms of Euclidean distance, which leads to M patches in .

### 5.2. Similar/Corresponding Patch Search

For each constructed patch in , we search for its similar patches locally in , and its corresponding patch in . A metric is necessary to measure the similarity between patches. It remains a challenging problem as the patches are irregular.

Similarity Metric. We deploy a simplified method of (Zeng et al., 2018) to measure the similarity between patch and patch . The key idea is to compare the distance of the two patches, from each point to the tangent plane at the patch center.

Firstly, we structure the tangent planes of the two patches. A point cloud describes the surface of the object. We thus calculate the surface normals and for patch and patch respectively. Then we acquire the tangent planes of the patches at the patch center and .

Secondly, we measure the difference of patches with the distance of the two patches from each point to the corresponding tangent plane. Specifically, we project each point in patch and patch to the tangent plane of patch . For the -th point in patch , we find the point in , whose projection on the tangent plane is closest to that of . We then define and as the distance of the two points to their projections on the tangent plane. is regarded as the difference of the two patches in point and . Then we acquire the average difference between the two patches at all the points:

 (15) D\raisebox3.429948pt\makebox[0.0pt][l]\resizebox10.49984pt3.91994pt$⇀$mn= ⎷1k+1k+1∑i=1[dn(vin)−dn(vim)]2.

Similarly, projecting each point in patch and patch to the tangent plane of patch , we acquire an average difference . The final mean difference between the two patches is:

 (16)

Finally, we measure the patch similarity with a thresholded Gaussian function using the above mean difference:

 (17) smn=⎧⎨⎩exp(−D2mn2ϵ2),Dmn⩽r,0,Dmn>r,

where is a threshold determined by the density of the point cloud, and is a variance-related parameter. The larger is, the more similar and are.

Local Patch Search. Given the similarity measure, we search for similar patches within the current frame. The number of the similar patches depends on the size of the point cloud. As to the corresponding patch in the previous frame, we only search one patch as the corresponding patch. Given a target patch in the -th frame , we choose the most similar patch to in the -th frame as the corresponding patch .

In order to reduce the computation complexity, we set a local window in the -th frame for the corresponding patch search, which contains patches centering at the K-nearest neighbors of the target patch center. Thus we evaluate the patch similarity between the target patch and these K-nearest patches instead of all the patches in the -th frame. Once we acquire the patch , we deploy its similarity measure in Eq. (17) to the patch as the weighting parameter in Eq. (10). Similarly, we set a local window for similar patch search in the -th frame.

### 5.3. Graph Construction

Having searched intra-frame similar patches and inter-frame corresponding patches, we construct a spatio-temporal graph over the patches. Though this graph is supposed to be learned via Eq. (14), the computational complexity of solving the optimization problem would be high and the numerical computation might be unstable. Instead, we propose to manually build intra-frame graph connectivities and inter-frame graph connectivities based on the patch similarity, as shown in Fig. 3.

Intra-frame graph construction. Given a target patch in the -th frame, we construct a bipartite graph between and each of its similar patches . Specifically, each point in is connected with its nearest neighbors in , where the distance is in terms of their projections on the tangent plane decided by the surface normal of at the patch center. Similarly, each point in is connected with the nearest points in in terms of their projections on the tangent plane decided by the surface normal of at the patch center. The intra-frame connectivities are undirected and share the same weight as in Eq. (17). We build intra-frame connectivities over all the patches in this way, which leads to the graph Laplacian , where is the number of points in each patch and is the number of patches in the -th frame.

Note that, we do not connect points within each patch explicitly in order to avoid bringing the coordinates close to each other in a patch. However, connectivities may exist among some points if they are nearest neighbors in overlapping patches.

Inter-frame graph construction. In order to leverage the inter-frame correlation and keep the temporal consistency, we connect corresponding patches between the -th frame and -th frame. Similar to the intra-graph construction, we connect each point in patch with its nearest points in patch , where the distance is in terms of their projections on the tangent plane decided by the surface normal of patch at the patch center. The edges are undirected and share the same weight , which is the similarity measurement in Eq. (17).

### 5.4. Optimization Algorithm

We first rewrite Eq. (14) for efficient optimization. We define a matrix to describe the weights between corresponding patches:

 (18) Wt,t−1=diag⎡⎢ ⎢⎣√w1...√w1k+1√w2...√w2k+1...√wM...√wMk+1⎤⎥ ⎥⎦.

Then we rewrite Eq. (14) in the following form:

 (19) minUt,Lt,Wt,t−1  λ1∥Ut−^Ut∥22+λ2∥Wt,t−1(StUt−Ct)−Wt,t−1^Pt−1∥22+(StUt−Ct)⊤Lt(StUt−Ct).

Eq. (19) is nontrivial to solve with three optimization variables. We propose an efficient alternating minimization approach as follows. Firstly, we initialize with the noisy observation , based on which we calculate from the proposed intra-frame graph construction and from the proposed inter-frame graph construction. Secondly, we fix both and , take derivative of Eq. (19) with respect to and set the derivative to . This leads to the closed-form solution of :

 (20) (S⊤tLtSt+λ1I+λ2S⊤tW⊤t,t−1Wt,t−1St)Ut=S⊤tLtCt+λ1^Ut+λ2S⊤tW⊤t,t−1Wt,t−1(Ct+^Pt−1).

Then we update and from the solved . The iterations are repeated until convergence, i.e., when the difference of , , and from their values in the previous iteration is trivial.

Note that, we first perform denoising on the first frame with only intra-correlations. Then for the next frame, in order to take advantage of the previously reconstructed frame for better reference, we take as patches in the denoised previous frame instead of those in the observed noisy previous frame. Hence, the final solution of in Eq. (19) serves as the reference frame for the denoising of the next frame. A summary of the proposed algorithm is shown in Algorithm 1.

## 6. Experimental Results

### 6.1. Experimental Setup

We evaluate our algorithm by testing on dynamic point clouds from MPEG (Ebner et al., 2018) and JPEG Pleno (Eugene d’Eon and Chou, 2017), including , , , and . We randomly choose 6 consecutive frames as the sample data: frame 601-606 in , frame 1201-1206 in , frame 1201-1206 in , frame 1501-1506 in , and frame 1411-1416 in . The number of points in each frame is about 1 million, so we perform down-sampling with the sampling rate of 0.05 prior to the denoising. Because the point clouds in the dataset are clean, we add white Gaussian noise with a range of variance . Then we compare our algorithm with three static point cloud denoising methods: MRPCA (Mattei and Castrodad, 2017), APSS (Guennebaud and Gross, 2007), and RIMLS (Oztireli et al., 2009), where we perform each static denoising method frame by frame independently on dynamic point clouds. Also, we compare with our Baseline scheme for ablation study, in which we remove the temporal reference by setting in Eq. (19). That is, Baseline performs denoising on each frame independently. Regarding the evaluation metric, we adopt mean squared error (MSE), i.e., the average Euclidean distance between the denoised point cloud sequence and the ground truth. That is, we take the average of the MSE on frames as our metric. Besides, for the first frames in all the datasets, we set because they have no previous frame.

### 6.2. Experimental Results

Objective results. We list the denoising results of different methods in Tab. 4,4,4,4, and mark the lowest MSE in bold. We see that our method outperforms all the four static point cloud denoising methods on the five datasets under all the noise levels. Specifically, we reduce the average MSE by on average over APSS, on average over RIMLS, on average over MRPCA, and on average over Baseline. This validates the effectiveness of our method. In particular, the improvement over Baseline validates that the temporal correlation we exploit is beneficial to dynamic point cloud denoising. Further, the MSE reduction over Baseline is respectively with increasing noise levels. This indicates that the temporal correlation makes more impact at high noise levels, because the inter-frame difference is more negligible compared to the noise with large variance.

For easier comparison with static point cloud denoising methods, we compute the average MSE on the five datasets under each noise level for different methods. The results are visualized in Fig. 4. We see that we achieve the best performance under various noise levels.

Subjective results. As illustrated in Fig. 5, the proposed method also has competitive visual results, especially in local details and temporal consistency. In order to demonstrate the temporal consistency, instead of the previous chosen frames as in Sec. 6.1, we choose another 6 consecutive frames that exhibit apparent movement in and under noise variance 0.05. We show the visual comparison with APSS and MRPCA because they have comparatively better objective performance as presented in Fig. 4. We see that, our results preserve the local structure and keep the temporal consistency better. For example, in the dataset, the boundary of the left hand in our result is much cleaner than that in APSS, and smoother than that in MRPCA. Also, our result exhibits better temporal consistency in general.

## 7. Conclusion

While the denoising of static 3D point clouds has been widely studied, it remains a challenge to denoise dynamic point clouds. In order to address the problem, we propose a graph-based method to exploit both the intra-frame self-similarity and inter-frame consistency. Specifically, we propose spatio-temporal graph modeling of patches in dynamic point clouds, and pose a MAP estimation on the underlying patches. The key is to construct intra-frame connectivities among searched similar patches within the same frame, as well as inter-frame connectivities between searched corresponding patches over consecutive frames. We then accordingly cast dynamic point cloud denoising as an optimization problem, which leverages the similar/corresponding patches and a graph-signal smoothness prior based on the constructed graph. Experimental results show that our method outperforms frame-by-frame denoising from state-of-the-art static point cloud denoising approaches.

## References

• (1)
• Alexa et al. (2003) Marc Alexa, Johannes Behr, Daniel Cohen-Or, Shachar Fleishman, David Levin, and Claudio T. Silva. 2003. Computing and Rendering Point Set Surfaces. IEEE Transactions on Visualization and Computer Graphics 9, 1 (2003), 0–15.
• Almonacid et al. (2018) Jonathan Almonacid, Celia Cintas, Claudio Derieux, and Mirtha Lewis. 2018. Point Cloud Denoising using Deep Learning. In Congreso Argentino de Ciencias de la Informática y Desarrollos de Investigación (CACIDI). 1–5.
• Avron et al. (2010) Haim Avron, Andrei Sharf, Chen Greif, and Daniel Cohen-Or. 2010. l1-Sparse Reconstruction of Sharp Point Set Surfaces. ACM Transactions on Graphics (TOG) 29 (10 2010), 135.
• Buades et al. (2005) Antoni Buades, Bartomeu Coll, and J-M Morel. 2005. A non-local algorithm for image denoising. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vol. 2. 60–65.
• Chung (1997) Fan RK Chung. 1997. Spectral graph theory. In Conference Board of the Mathematical Sciences. American Mathematical Society.
• Dabov et al. (2007) Kostadin Dabov, Alessandro Foi, Vladimir Katkovnik, and Karen Egiazarian. 2007. Image Denoising by Sparse 3-D Transform-Domain Collaborative Filtering. IEEE Transactions on Image Processing (TIP) 16, 8 (Aug 2007), 2080–2095.
• Digne (2012) Julie Digne. 2012. Similarity based filtering of point clouds. In IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). 73–79.
• Dinesh et al. (2018) Chinthaka Dinesh, Gene Cheung, Ivan V. Bajic, and Yang Cheng. 2018. Fast 3D Point Cloud Denoising via Bipartite Graph Approximation & Total Variation. (2018).
• Ebner et al. (2018) Thomas Ebner, Ingo Feldmann, Oliver Schreer, Peter Kauff, and Tanja Unger. 2018. HHI Point cloud dataset of a boxing trainer. http://mpegfs.int-evry.fr/MPEG/PCC/DataSets/pointCloud/CfP/. In ISO/IEC JTC1/SC29/WG11 (MPEG2018) input document M42921.
• Eugene d’Eon and Chou (2017) Taos Myers Eugene d’Eon, Bob Harrison and Philip A. Chou. 2017. 8i Voxelized Full Bodies, version 2 - A Voxelized Point Cloud Dataset. ISO/IEC JTC1/SC29 Joint WG11/WG1 (MPEG/JPEG) input document m40059/M74006 (January 2017).
• Gao et al. (2018) Xiang Gao, Wei Hu, and Zongming Guo. 2018. Graph-Based Point Cloud Denoising. In IEEE Fourth International Conference on Multimedia Big Data (BigMM). 1–6.
• Guennebaud and Gross (2007) Gaël Guennebaud and Markus Gross. 2007. Algebraic Point Set Surfaces. In ACM SIGGRAPH. 23.
• Hu et al. (2015) Wei Hu, Gene Cheung, Antonio Ortega, and Oscar C. Au. 2015. Multiresolution graph fourier transform for compression of piecewise smooth images. IEEE Transactions on Image Processing (TIP) 24 (January 2015), 419–433.
• Huang et al. (2013) Hui Huang, Shihao Wu, Minglun Gong, Daniel Cohen-Or, and Hao Zhang. 2013. Edge-Aware Point Set Resampling. ACM Transactions on Graphics (TOG) 32, 1 (2013), 1–12.
• Huang et al. (2009) Wenming Huang, Yuanwang Li, Peizhi Wen, and Xiaojun Wu. 2009. Algorithm for 3D Point Cloud Denoising. In International Conference on Genetic and Evolutionary Computing.
• Hui et al. (2009) Huang Hui, Li Dan, Zhang Hao, Uri Ascher, and Daniel Cohen-Or. 2009. Consolidation of unorganized point clouds for surface reconstruction. In Acm Siggraph Asia.
• Lipman et al. (2007) Yaron Lipman, Daniel Cohen-Or, David Levin, and Hillel Tal-Ezer. 2007. Parameterization-free projection for geometry reconstruction. Acm Transactions on Graphics 26, 3 (2007), 22.
• Mattei and Castrodad (2017) Enrico Mattei and Alexey Castrodad. 2017. Point Cloud Denoising via Moving RPCA. Computer Graphics Forum 36 (11 2017).
• Nguyen et al. (2012) Chuong V Nguyen, Shahram Izadi, and David Lovell. 2012. Modeling kinect sensor noise for improved 3d reconstruction and tracking. In International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission (3DIMPVT). 524–530.
• Oztireli et al. (2009) Cengiz Oztireli, Gaël Guennebaud, and Markus Gross. 2009. Feature Preserving Point Set Surfaces based on Non-Linear Kernel Regression. 28 (2009), 493–501.
• Rosman et al. (2013) Guy Rosman, Anastasia Dubrovina, and Ron Kimmel. 2013. Patch-Collaborative Spectral Point-Cloud Denoising. In Computer Graphics Forum, Vol. 32. Wiley Online Library, 1–12.
• Rue and Held (2005) Havard Rue and Leonhard Held. 2005. Gaussian Markov random fields: theory and applications. Chapman and Hall/CRC.
• Rusu and Cousins (2011) Radu Bogdan Rusu and Steve Cousins. 2011. 3D is here: Point Cloud Library (PCL). IEEE International Conference on Robotics and Automation (2011), 1–4.
• Rusu et al. (2008) Radu Bogdan Rusu, Zoltan Csaba Marton, Nico Blodow, Mihai Dolha, and Michael Beetz. 2008. Towards 3D Point cloud based object maps for household environments. Robotics and Autonomous Systems 56, 11 (2008), 927–941.
• Shen et al. (2010) Godwin Shen, Woo Shik Kim, Sunil K. Narang, Antonio Ortega, and Ho Cheon Wey. 2010. Edge-adaptive transforms for efficient depth map coding. In IEEE Picture Coding Symposium (PCS). 566–569.
• Shuman et al. (2013) David I Shuman, Sunil K. Narang, Pascal Frossard, Antonio Ortega, and Pierre Vandergheynst. 2013. The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains. IEEE Signal Processing Magazine 30 (2013), 83–98.
• Spielman (2004) DA Spielman. 2004. Lecture 2 of spectral graph theory and its applications.
• Sun et al. (2008) Xianfang Sun, Paul L. Rosin, Ralph R. Martin, and Frank C. Langbein. 2008. Noise in 3D laser range scanner data. IEEE International Conference on Shape Modeling and Applications (2008), 37–45.
• Sun et al. (2015) Yujing Sun, Scott Schaefer, and Wenping Wang. 2015. Denoising point sets via L 0 minimization.
• Tulvan et al. (2016) Christian Tulvan, Rufael Mekuria, Zhu Li, and Sebastien Laserre. 2016. Use Cases for Point Cloud Compression. In ISO/IEC JTC1/SC29/WG11 (MPEG) output document N16331.
• Yan and Zhai (2015) Fu Yan and Jinlei Zhai. 2015. Research on scattered points cloud denoising algorithm. In IEEE International Conference on Signal Processing.
• Zeng et al. (2018) Jin Zeng, Gene Cheung, Michael Ng, Jiahao Pang, and Cheng Yang. 2018. 3d point cloud denoising using graph laplacian regularization of a low dimensional manifold model. arXiv preprint arXiv:1803.07252 (2018).
• Zhang et al. (2014) Cha Zhang, Dinei Florencio, and Charles Loop. 2014. Point cloud attribute compression with graph transform. In IEEE International Conference on Image Processing (ICIP). 2066–2070.
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters