L1 Splines for Robust, Simple, and Fast Smoothing of Grid Data
Abstract
Splines are a popular and attractive way of smoothing noisy data. Computing splines involves minimizing a functional which is a linear combination of a fitting term and a regularization term. The former is classically computed using a (weighted) L2 norm while the latter ensures smoothness. Thus, when dealing with grid data, the optimization can be solved very efficiently using the DCT. In this work we propose to replace the L2 norm in the fitting term with an L1 norm, leading to automatic robustness to outliers. To solve the resulting minimization problem we propose an extremely simple and efficient numerical scheme based on splitBregman iteration combined with DCT. Experimental validation shows the highquality results obtained in short processing times.
L1 Splines for Robust, Simple, and Fast Smoothing of Grid Data
Splines \kwdL1 fitting \kwdsplitBregman \kwdgrid data
1 Introduction
Smoothing a dataset consists in finding an approximating function that captures important patterns in the data, while disregarding noise or other finescale structures. Let be an dimensional discrete signal, where () is the domain of along the th dimension. We can model by
(1) 
where represents some noise and is a smooth function. A very common regularization choice is to enforce continuity, in which case is called a (cubic) spline. Smoothing relies upon finding the best estimate of under the proper smoothness and noise assumptions. We can approximate by minimizing an objective functional
(2) 
where is a data fitting term, defined by the distribution of , is a regularization term, and is a scalar that determines the balance between both terms. Such scalar parameter can be automatically derived using Bayesian or MDL techniques, as will be later shown.
For clarity, we will describe in depth the case , where we have samples. We extend the results later to the general dimensional case.
Let us begin by explaining the smoothing term. The continuity requirement leads to define , being a discrete secondorder differential operator, defined , by
(3)  
(4)  
(5) 
where represents the step, or sampling rate, between and . Assuming repeating border elements, that is, and , gives , and .
Regarding the fitting term, the classical assumption is that the noise in Equation (1) has Gaussian distribution with zero mean and unknown variance, which leads to setting . Smoothing then can be formulated as the leastsquares regression
For clarity, we call the estimate obtained with this method an L2 spline.
It is a well known fact that least squares estimates for regression models are highly nonrobust to outliers. Although there is no agreement on a universal and formal definition of an outlier, it is usually regarded as an observation that does not follow the patterns in the data. Notice that smoothing should produce an estimate taking into account only important patterns, that is, the inliers, in the data. In this sense, the L2 formulation cannot correctly handle outliers by itself.
In order to solve this problem, we propose to take a different assumption on the distribution of the noise in Equation (1). By choosing a distribution with fatter tails than the Gaussian distribution, the derived estimator will correctly handle outliers. We thus assume that follows a Laplace distribution with zero mean and unknown scale parameter, a common practice in other problems as we will further discuss below. This leads to the fitting term and the regression then becomes
We call the estimate obtained with this formulation an L1 spline.
Let us point out that the use of L1 fitting terms for solving inverse problems is not new. For example, in 2001, Nikolova proved the theoretic pertinence of using L1 fitting terms for image denoising [11]. Other interesting works have addressed this approach for total variation image denoising [1, 5, 2, 12] or total variation optical flow [21, 16, 14] (this robustnesstooutliers type of ideas was previously introduced in the context of optical flow by Black and Anandan [3]). Recall that the totalvariation regularization term involves firstorder derivatives, while the proposed L1 splines, on the other hand, involve secondorder derivatives.
Regularly sampled signals are extremely common in practice, and their analysis becomes easier and faster. In particular, we follow the most common choice when dealing with discrete dimensional data, which is assuming a “rectangular” Cartesian sampling pattern. When the sampling is isotropic, i.e., “square,” we refer to this type of data as grid data.
We developed an iterative algorithm for computing L1 splines, based on splitBregman iteration [8], that is specially suited for the case of grid data. This algorithm is extremely fast, both in running time and in the number of iterations until convergence. It is also outstandingly simple, making the implementation completely straightforward.
The remainder of the paper is structured as follows. In Section 2, we overview a fast algorithm to compute L2 splines and robust L2 splines, a modification of the least squares regression that allows to handle outliers. In Section 3, we present the proposed algorithm for computing L1 splines. Then, in Section 4, we show results obtained with L1 splines, systematically outperforming its L2 and robust L2 counterparts in the presence of outliers. We also show that the proposed computational algorithm is very efficient. Finally, in Section 5 we provide some concluding remarks.
2 Smoothing splines
As aforementioned, the classical assumption is that the noise in Equation (1) follows a Gaussian distribution with zero mean and unknown variance. This leads to solve the leastsquares regression
(6) 
Since both terms are differentiable, we obtain
(7) 
Garcia proposed a very efficient method for dealing with regularly sampled data [7]. Assuming that the data are equally spaced, that is, without loss of generality , we obtain
(8) 
An eigendecomposition of yields , where is a diagonal matrix containing the eigenvalues of , given by [20]
(9) 
Since is a unitary matrix, we can write Equation (7) as
(10) 
Let us define the matrix . Trivially,
(11) 
Following Strang [15] and Garcia [7], let us observe that is a DCTII matrix and is an inverse DCTII matrix. Then, Equation (10) can be expressed as
(12) 
where and stand for the DCTII and inverse DCTII functions. Equation (12) provides a fast and simple algorithm for computing L2 splines.
2.1 Robust estimation.
Often in practice there are in some values that could not be observed (or recorded) for some reason. We would like to be able to handle such cases in such a way that the missing values are inferred from the ones that can be observed. Let be an diagonal matrix such that represents a weight assigned to observation . is defined by
(13) 
where is some arbitrary constant in ; in practice, and without loss of generality, we set . We can then solve
(14) 
which will simply omit the missing points from the computation of the residual while the regularizer will still have a smoothing effect over both present and missing points. Equation (14) acts as an impainting algorithm, filling the missing values in such a way that continuity between filled values and smoothed ones is preserved. The minimization of Equation (14) gives
(15) 
This leads to the iterative procedure
(16) 
which, similarly to Equation (12), becomes
(17) 
On a different note, real data often present observations that lie abnormally far from their “true” value, i.e., that do not appear to follow the pattern of the other data points. The main drawback of the penalized least squares formulation Equation (6) is its sensitivity to these outliers. To address this issue, weights can be assigned to every point, as in Equation (14), such that outliers exert less influence during the estimation process. In this case, the weights are iteratively refined during the estimation process using robust estimators for the mean and variance of the data. Defining these estimators is a complex problem by itself. For details about how can be set and updated for added robustness to outliers, refer to Garcia’s work [7].
3 L1 splines
In this section we introduce a different splines formulation in order to handle outliers in the data. We assume that the noise in Equation (1) follows a Laplace distribution with zero mean and unknown scale parameter, which leads to . The regression then becomes
(18) 
Goldstein and Osher [8] proposed a very elegant and efficient algorithm for solving the L1 constrained problem (related to a number of very efficient optimization algorithms, e.g., see [6])
(19) 
For this, they consider the equivalent problem
(20) 
which they first convert it into the unconstrained problem
(21) 
In this form, the penalty function does not accurately enforce the constraint for small . The constraint is enforced by letting . However, another solution for this new formulation is found by using the following twophase algorithm
(22)  
(23) 
This algorithm is often denoted in the literature as splitBregman iteration. This class of algorithms has several nice theoretical properties and has successfully been applied to several problems in practice such as image restoration [13], image denoising [18], compressed sensing [19], and image segmentation [9]; see also [6] and references therein.
We use this technique for solving Equation (18). We begin by setting and , which leads to the problem
(24) 
We then transform it into the unconstrained form
(25) 
and the Bregman iteration simply takes the form
(26)  
(27) 
Because of the splitting of the L1 and L2 components in the functional (26), we can perform this minimization efficiently by iteratively minimizing with respect to and separately,
(28)  
(29) 
For minimizing Equation (28) we set and . We obtain
(30) 
This is a classical L2 spline and can be minimized using Equation (12), as already explained. The optimal value of in Equation (29) can be explicitly computed using shrinkage operators,
(31) 
where
(32)  
and  
(33) 
We thus obtain a very efficient algorithm for computing L1 splines, combining DCT and shrinkage operators.
On a different note let us mention that Equation (25) can also be interpreted [10] as a relaxation of
(34) 
where the L0 norm is replaced by its (convex) L1 counterpart. In this case the underlying model for is , where is zeromean Gaussian noise and represents the “oulier” noise. Under this assumptions, practically becomes an “indicator function” of the presence of (sparse) outliers (see also [3]). Besides the different angle in the derivation of the model, our approach differs from [10] in two very important points. First, we use splitBregman iteration by introducing the variable in the optimization procedure, see equations (26) and (27). In [10] Equation (25) is first solved using direct alternate minimization over and , and then Equation (34) is solved via nonconvex minimization using the previous solution as a starting point. Second, considering the grid structure, we use the DCT approach to solve Equation (28), instead of the classical Cholesky decomposition. The combination between splitBregman and DCT results in a sound and fast algorithm for computing L1 splines on grid data.
3.1 Handling missing data
In the classical L2 formulation, a diagonal binary weighting matrix is used to cope with missing values (see Section 2.1 for details). Let us denote by the diagonal of . Let us first define
and  (35) 
We then pose Equation (14) as
(36) 
and equivalently extend Equation (18) as
(37) 
This leads to
(38) 
Then the splitBregman iteration can be written as
(39)  
(40)  
(41) 
Equation (39) can be solved using Equation (17). Solving Equation (40) amounts to performing a shrinkage operation on the dimensions where equals 1. In Equation (41), it suffices to update the dimensions of where equals 1.
3.2 Handling multidimensional data
Let us now return to the general case of dimensional data. Following Garcia [7], we extend Equation (12) as
(42) 
where and stand for the dimensional DCTII and inverse DCTII functions, and denotes the Schur (elementwise) product. Notice that the multidimensional DCT is simply a composition of onedimensional DCTs along each dimension. Extending Equation (11), is an th order tensor defined by
(43) 
where is an th order tensor of ones, and denotes the elementwise division. Finally, is an th order tensor, defined by
(44) 
where denotes the size of along the th dimension.
The algorithm and its complexity. The pseudocode for the general dimensional case is presented in Algorithm 1. Let us analyze its complexity. The DCT and inverse DCT require operations, where . The remaining operations are linear in . The overall complexity of the algorithm is then , where and are, respectively, the number of outerloop and innerloop iterations in Algorithm 1. Notice that Goldstein and Osher [8] recommend to perform only one innerloop iteration for achieving optimal efficiency. Thus, we set for all experiments. We will later see that in many cases the algorithm converges quickly ( can be very small then). The algorithm’s complexity is thus dominated by the computation of the DCT and inverse DCT. Of course, these standard operations can be easily computed using GPU, speedingup the execution by several orders of magnitude.
4 Experimental results
For all experiments we adhere to the following setup:

using generalized cross validation, we find the best estimate for for the robust L2 formulation (problem (14));

we then find L2 splines (problem (6)), robust L2 splines (problem (14)),^{2}^{2}2Code available at http://www.mathworks.com/matlabcentral/fileexchange/25634robustsplinesmoothingfor1dtonddata. and/or L1 splines (problem (18)), setting .
This protocol allows us to show that, even when is chosen to fit optimally the robust L2 formulation, the proposed method provides better estimates. For the L1 formulation, in Equation (18), we simply set for all examples. We recall that is set to 1. We also set (see Algorithm 1) and additionally limit the maximum number of outer iterations to a hundred. The algorithm stops when any of the two conditions is met.
Fig. 2 presents two onedimensional examples. We depict the original signal , where . We observe the signal , where is Gaussian noise. Some points () are further contaminated with uniform noise , where , such that . The points affected by are depicted in red and the remaining ones in green. In the top row, ; and in the bottom row, . In both examples, only the L1 spline is correct. The classical and robust L2 splines are both unable to correctly recover the original data in the corrupted part.
Fig. 3 shows the evolution of the relative error as the number of iterations increases for the example in Fig. 2 (top row). As we can observe, the proposed algorithm is able to converge quickly, reaching a precision of in less than 20 iterations.
Fig. 4 depicts the relative timecost of each operation during the execution of the proposed method (Fig. 2, top row). Computing the DCT and the inverse DCT covers more than 84% of the total running time. Implementing these standard operations in GPU would boost the performance of the algorithm by orders of magnitude.
In Fig. 5 we present a twodimensional example. We depict the original signal in Fig. (a)a, and we add two types of noise: first, Gaussian noise with zero mean and variance (Fig. (b)b), and then uniform noise in the interval (Fig. (c)c). Again, only the L1 spline correctly recovers the original signal.
We next test the proposed algorithm with a climate timeseries provided by the Met Office Hadley Centre [4].^{3}^{3}3Data are available in http://hadobs.metoffice.com/crutem3/diagnostics/global/nh+sh/annual. The dataset contains the evolution of global average land temperature anomaly (in ℃) with respect to the 19611990 average temperature. The results, which confirm an upward trend in the second half of the 20th century, are shown in Fig. 6.
We also test on this dataset the effect of varying the parameters and , see Fig. 7. As in the classical L2 formulation, has a direct impact on the obtained result, controlling the degree of smoothness of the solution, see Fig. (a)a. On the contrary, Fig. (b)b shows that the newly introduced parameter is very stable and provides very similar results in a wide range . This stability allows us to fix its value to for all the experiments in this work.
Another interesting example is presented in [10]. The dataset consists of power consumption measurements (in kW) for a government building, collected every fifteen minutes from July 2005 to October 2010. As in [10], we downsample the data by a factor of four, yielding one measurement per hour, and use only a subset of the whole data. The results are displayed in Fig. 8.
We also test in a synthetic example the ability to recover signals with sharp transitions, see Fig. 9. In this case we use a simple piecewise constant function. We can observe clear overshoot (plus ringing) effects on the L2 and robust L2 splines. The robust L2 spline also results in transitions with less vertical slopes, creating a bluring effect. With the L1 spline we obtain a much better reconstruction, with almost nonexistent overshooting.
This very same effect can be observed in real examples, see Fig. 10. When approximating images with splines, some structure is lost by blur and some structure is artificially created by overshooting and ringing. This can be observed in Figs. (a)a and (b)b, were the difference between the original image and the image estimated by robust L2 splines exhibits structure. Observe, however, that almost no structure in the difference is visible when the reconstruction is performed using L1 splines.
4.1 Application to range data
In this section we perform smoothing of depth data obtained with a Kinect camera. This kind of data is particularly challenging because:

it presents relatively smooth areas separated by sharp transitions,

edges are highly noisy, that is, edge pixels oscillate over time between foreground and background, and

it contains missing data, which appear for two different reasons: (1) the disparity between the IR projector and the IR camera produces “shadows,” and (2) the depth cannot be recovered in areas where the IR pattern is not clearly observable (e.g., because they receive direct sunlight or interference from another Kinect).
We use splines to interpolate and denoise these data, showing the advantage of L1 splines over its robust L2 counterpart. The displayed images are part of the LIRIS human activities dataset [17].
In the first example, shown in Fig. 11, we use a single depth frame (with standard Kinect resolution of ). The missing data are represented in black, while depth data goes from red to yellow as depth increases. Both, the L1 spline and the robust L2 spline are able to interpolate the missing data with reasonable values. Notice, however, that the latter exhibits, as aforementioned, overshooting and ringing (clearly perceived in the 1D profile). These effects are much milder in the L1 reconstruction.
In Fig. (a)a we can clearly observe that the position of the missing data is not consistent across frames. We can integrate data from several frames to achieve more accurate interpolations, by performing 3D reconstructions. Thus, in this example, we treat depth data as a 3D signal (2D + time), by considering three consecutive frames. The data dimensionality is then . A full depth video can be smoothed by using 3D splines as a slidingwindow type of filter. The robust L2 spline again presents a noisier behavior and with significative overshooting. On the other hand, the L1 spline is much smoother in smooth areas while correctly preserving abrupt transitions.

Running times. We present in Table 1 the runningtime and number of iterations until convergence for every example in this work. The time of the robust L2 and the L1 splines is comparable. All code is written in pure Matlab, with no C++ or mex optimizations. All experiments were run on a MacBook Pro with a 2.7GHz Intel Core i7 processor. Finally, note that in most cases the algorithm converges in less than twenty outeriterations (recall that ). In the example in Fig. 5, the maximum number of iterations (100) is reached with a final error of .
Size  Robust L2 spline  L1 spline  
Time  Time  Iters.  
Complete data  Fig. 2  4.931  3.590  7  
Fig. 5  0.541  0.982  100  
Fig. 6  163  0.007  0.045  72  
Fig. 8  501  0.010  0.010  11  
Fig. 9  0.035  0.007  6  
Fig. (a)a  1.167  0.758  17  
Fig. (b)b  1.143  0.873  20  
Missing data  Fig. 11  0.911  0.266  2  
Fig. 12  7.511  2.202  3 
5 Conclusions
We have presented a new method for robustly smoothing regularly sampled data. We do this with modified splines, where we replace the classical L2norm in the fitting term by an L1norm. This automatically handles outliers, thus obtaining a robust approximation.
We also presented a new technique, using splitBregman iteration, for solving the resulting optimization problem. The algorithm is extremely simple and easy to code. The method converges very quickly and has a small memory footprint. It also makes extensive use of the DCT, thus being straightforward to implement in GPU. These characteristics make this method very suitable for largescale problems.
Acknowledgment
Work partially supported by NSF, ONR, NGA, ARO, DARPA, and NSSEFF. We thank Dr. Gonzalo Mateos for kindly providing the power consumption dataset.
References
 [1] S. Alliney. A property of the minimum vectors of a regularizing functional defined by means of the absolute norm. IEEE Trans. Signal Process., 45(4):913–917, April 1997.
 [2] J. F. Aujol, G. Gilboa, T. Chan, and S. Osher. Structuretexture image decompositionmodeling, algorithms, and parameter selection. Int. J. Comput. Vision, 67(1):111–136, April 2006.
 [3] M. J. Black and P. Anandan. Robust dynamic motion estimation over time. In CVPR, pages 296–302, Maui, Hawaii, June 1991.
 [4] P. Brohan, J. J. Kennedy, I. Harris, S. F. B. Tett, and P. D. Jones. Uncertainty estimates in regional and global observed temperature changes: A new data set from 1850. Journal of Geophysical Research, 111(D12):D12106+, June 2006.
 [5] T. F. Chan and S. Esedog̃lu. Aspects of total variation regularized L1 function approximation. SIAM J. Appl. Math., 65(5):1817–1837, July 2005.
 [6] P. L. Combettes and J. C. Pesquet. Proximal splitting methods in signal processing. In H. H. Bauschke, R. S. Burachik, P. L. Combettes, Elser, D. R. Luke, and H. Wolkowicz, editors, FixedPoint Algorithms for Inverse Problems in Science and Engineering, pages 185–212. Springer, May 2011.
 [7] D. Garcia. Robust smoothing of gridded data in one and higher dimensions with missing values. Comput Stat Data Anal, 54(4):1167–1178, April 2010.
 [8] T. Goldstein and S. Osher. The split Bregman method for L1regularized problems. SIAM J. Img. Sci., 2(2):323–343, April 2009.
 [9] Tom Goldstein, Xavier Bresson, and Stanley Osher. Geometric applications of the split Bregman method: Segmentation and surface reconstruction. J. Sci. Comput., 45(1):272–293, October 2010.
 [10] G. Mateos and G. B. Giannakis. Robust nonparametric regression via sparsity control with application to load curve data cleansing. IEEE Transactions on Signal Processing, 60(4):1571–1584, April 2012.
 [11] M. Nikolova. Minimizers of costfunctions involving nonsmooth datafidelity terms. application to the processing of outliers. SIAM J. Numer. Anal., 40(3):965–994, March 2002.
 [12] M. Nikolova, M. K. Ng, and C. P. Tam. On data fitting and concave regularization for image recovery. Technical report, École Normale Supérieure de Cachan, Cachan, France, March 2012.
 [13] S. Osher, M. Burger, D. Goldfarb, J. Xu, and W. Yin. An iterative regularization method for total variationbased image restoration. Multiscale Model. Simul., 4(2):460–489, 2005.
 [14] Lars L. Rakêt, Lars Roholm, Mads Nielsen, and François Lauze. TVL1 optical flow for vector valued images. In EMMCVPR, pages 329–343, 2011.
 [15] G. Strang. The discrete cosine transform. SIAM Rev., 41(1):135–147, March 1999.
 [16] A. Wedel, T. Pock, C. Zach, H. Bischof, and D. Cremers. An improved algorithm for TVL1 optical flow. In D. Cremers, B. Rosenhahn, A. L. Yuille, and F. R. Schmidt, editors, Statistical and Geometrical Approaches to Visual Motion Analysis, volume 5064, pages 23–45. SpringerVerlag, 2009.
 [17] C. Wolf, J. Mille, L. E. Lombardi, O. Celiktutan, M. Jiu, M. Baccouche, E. Dellandrea, C. E. Bichot, C. Garcia, and B. Sankur. The LIRIS human activities dataset and the ICPR 2012 human activities recognition and localization competition. Technical Report RRLIRIS2012004, LIRIS Laboratory, March 2012.
 [18] Jinjun Xu and S. Osher. Iterative regularization and nonlinear inverse scale space applied to WaveletBased denoising. IEEE Trans. Image Process., 16(2):534–544, February 2007.
 [19] W. Yin, S. Osher, D. Goldfarb, and J. Darbon. Bregman iterative algorithms for L1minimization with applications to compressed sensing. SIAM J. Imaging Sci., 1(1):143–168, 2008.
 [20] W. C. Yueh. Eigenvalues of several tridiagonal matrices. Appl. Math. ENotes, 5:66–74, 2005.
 [21] C. Zach, T. Pock, and H. Bischof. A duality based approach for realtime TVL1 optical flow. In DAGM, pages 214–223, Berlin, Heidelberg, 2007. SpringerVerlag.