Efficient FirstOrder Algorithms for Adaptive Signal Denoising
Abstract
We consider the problem of discretetime signal denoising, focusing on a specific family of nonlinear convolutiontype estimators. Each such estimator is associated with a timeinvariant filter which is obtained adaptively, by solving a certain convex optimization problem. Adaptive convolutiontype estimators were demonstrated to have favorable statistical properties, see [JN09, JN10, HJNO15, OHJN16a]. Our first contribution is an efficient algorithmic implementation of these estimators via the known firstorder proximal algorithms. Our second contribution is a computational complexity analysis of the proposed procedures, which takes into account their statistical nature and the related notion of statistical accuracy. The proposed procedures and their analysis are illustrated on a simulated data benchmark.
1 Introduction
We consider the problem of discretetime signal denoising. The goal is to estimate a complex discretetime signal on , observed in complex Gaussian noise on :
(1) 
Here, are i.i.d. random variables with standard complex Gaussian distribution , that is, and are independent standard Gaussian random variables.
Signal denoising is a classical problem in statistical estimation and signal processing; see [IK81, Nem00, Tsy08, Was06, Hay91, Kay93]. The conventional approach is to assume that comes from a known set with a simple structure that can be exploited to build the estimator. For example, one might consider signals belonging to linear subspaces or signals whose spectral representation, as given by the Discrete Fourier or Discrete Wavelet transform, comes from a linearly transformed ball, see [Tsy08, Joh11]. In all these cases, estimators with nearoptimal statistical performance can be found explicitly, and correspond to linear functionals of the observations , hence the name linear estimators.
We focus here on a family of nonlinear estimators with larger applicability and strong theoretical guarantees, in particular when the structure of the signal is unknown beforehand, studied in [Nem91, JN09, JN10, HJNO15, OHJN16a]. Assuming for convenience that the goal is to estimate on , these estimators write as:
(2) 
here is called a filter and is supported on which we write as . For each estimator in this family, the filter is then obtained as an optimal solution of a convex optimization problem. For instance, the Penalized LeastSquares estimator [OHJN16a] is defined as
(3) 
where is the Discrete Fourier transform (DFT) on , and is the norm on . We shall give a summary of the other estimators in the end of this section. The optimization problems associated to all of them rest upon a common principle – minimization of the residual
(4) 
regularized via the norm in the Fourier domain .
While the statistical properties of adaptive convolutiontype estimators have been extensively studied [Nem91, JN09, JN10, HJNO15, OHJN16a], the question of their algorithmic implementation remains largely unexplored. In fact, we are not aware of any publicly available implementation of these estimators. Our goal here is to close this gap.
Note that problems similar to (3) belong to the general class of secondorder cone problems, and hence can in principle be solved to high numerical accuracy in polynomial time via interiorpoint methods [BTN01]. However, the computational complexity of interiorpoint methods grows polynomially with the problem dimension, and quickly becomes prohibitive when facing signal and image denoising problems (for example, in image denoising this number is proportional to the number of pixels which might be as large as ). Furthermore, it is unclear whether highaccuracy solutions are necessary for statistical recovery purposes. The level of accuracy sought should be adjusted to the noise level of the problem at hand.
On the other hand, one can leverage several favorable properties of these problems:

Easily accessible firstorder information. The objective value and gradient at a given point can be computed in via Fast Fourier Transform (FFT).

Simple geometry. After a straightforward reparametrization, one is left with norm penalty or ball as a feasible set in the constrained formulation. Proxmappings for such problems, with respect to both the Euclidean distance and the “adapted” distancegenerating functions, can be computed efficiently.

Medium accuracy is sufficient. Approximate solutions with specified (and medium) accuracy still preserve the statistical performance of the exact solutions.
All these properties make firstorder optimization algorithms the tools of choice to deal with (3) and similar problems.
Outline.
In Section 2, we recall two general classes of optimization problems, composite minimization problems [BT09, NN13] and composite saddlepoint problems [JN11, NN13], and the corresponding algorithms suitable for their numerical solution. In Section 3, we present efficient firstorder optimization algorithms based on these general approaches. In particular, we show how to compute firstorder oracles in quasilinear time in the signal length using FFT. In Section 4, we establish worstcase complexity bounds for the proposed algorithms that explicitly depend on the statistical quantities controlling the statistical difficulty of the problem: the signal length , the noise variance , and the filter norm parameter for which must be an upper bound. These bounds show that iterations of a suitable firstorder algorithm are sufficient to match the statistical properties of an exact estimator, where is the peak signaltonoise ratio in the Fourier domain. This gives a sharp and rigorous characterization (in the present context) of the performance of “early stopping” strategies that allow to stop an optimization algorithm much earlier than dictated by the pure optimization analysis. In Section 5, we present numerical illustrations of the proposed algorithms on simulated data, which complement our theoretical analysis.
Notation.
We denote the space of all complexvalued signals on , or, simply, the space of all twosided complex sequences. We call the finitedimensional subspace of consisting of signals supported on :
its counterpart consists of all signals supported on . The unknown signal will be assumed to come from one of such subspaces, which corresponds to a finite signal length. Note that the signals from can be naturally mapped to column vectors by means of the indexrestriction operator defined for any such that :
In particular, and define onetoone mappings and . For convenience, columnvectors in and will be indexed starting from zero. We define the scaled seminorms on :
We use the “Matlab notation” for matrix concatenation: is the vertical, and the horizontal concatenation of two matrices with compatible dimensions. We introduce the unitary Discrete Fourier Transform (DFT) operator on , defined by
The unitarity of implies that its inverse coincides with its conjugate transpose . Slightly abusing the notation, we will occasionally shorten to . In other words, is a map , and the adjoint map simply sends to via zeropadding. We use the “BigO” notation: for two nonnegative functions , on the same domain, means that there is a generic constant such that for any admissible value of the argument; means that is replaced with for some ; hereinafter is the natural logarithm, and is a generic constant.
Estimators.
We now summarize all the estimators that are of interest in this paper:

Constrained UniformFit estimator, given for by
(ConUF) 
Constrained LeastSquares estimator:
(ConLS) 
Penalized UniformFit estimator, given for by
(PenUF) 
Penalized LeastSquares estimator:
(PenLS)
2 Tools from convex optimization
In this section, we recall the tools from firstorder optimization to be used later. We describe two general type of optimization problems, composite minimization and composite saddlepoint problems, together with efficient firstorder algorithms for their solution. Following [NN13], we begin by introducing the concept of proximal setup which underlies these algorithms.
2.1 Proximal setup
Let the domain be a closed convex set in a Euclidean space . A proximal setup for is given by a norm on , not necessarily the Euclidean one, and a distancegenerating function (d.g. f.) , such that is continuous and convex on , admits a continuous selection of subgradients on the set , and is strongly convex w.r.t. .
The concept of proximal setup gives rise to several notions (see [NN13] for a detailed exposition): the center , the Bregman divergence , the radius and the proxmapping defined as
Blockwise proximal setups.
We now describe a specific family of proximal setups which proves to be useful in our situation. Let with ; note that we can identify this space with via (Hermitian) vectorization map ,
(5) 
Now, supposing that for some nonnegative integers , let us split into blocks of size , and equip with the group norm:
(6) 
We also define the balls .
Theorem 2.1 ([Nn13]).
Given as above, defined by
(7) 
is a d.g. f. for any ball of the norm (6) with center . Moreover, for some constant and any and , radius of is bounded as
(8) 
We will use two particular cases of the above construction.

Case , corresponds to the norm on , and specifies the complex setup.

Case , corresponds to the norm on , and specifies the setup .
It is convenient to introduce the following norms on :
(9) 
Note that gives the norm in the complex setup, while coincides with the standard norm on .
2.2 Composite minimization problems
In the general problem of composite minimization, the goal is to solve the convex program
(10) 
Here, is a domain in equipped with , is convex and continuously differentiable on , and is convex, lowersemicontinuous, finite on the relative interior of , and can be nonsmooth. Assuming that is equipped with a proximal setup , let us define the composite proxmapping [BT09]:
(11) 
Fast Gradient Method.
Fast Gradient Method (FGM), summarized as Algorithm 1, was introduced in [N07] as an extention of the celebrated Nesterov algorithm for smooth minimization [Nes83] to the case of constrained problems with nonEuclidean proximal setups. It is guaranteed to find an approximate solution of (10) with accuracy after iterations. We defer the rigorous statement of this accuracy bound to Sec. 4.
2.3 Composite saddlepoint problems
We also consider another class of optimization problems:
(12) 
Here, and are convex subsets of the corresponding Euclidean spaces, and is compact; function is convex in , concave in , and differentiable on ; function is convex, lowersemicontinuous, can be nonsmooth, and is such that is easily computable. We can associate with a smooth vector field ,
Saddlepoint problem (12) specifies two convex optimization problems: minimization of , or the primal problem, and maximization of , the dual problem. Under some general conditions which hold in the described setting [Sio58], (12) possesses an optimal solution , called a saddle point, such that the value of (12) is , and are optimal solutions to the primal and dual problems. The quality of a candidate solution can be evaluated via the duality gap – the sum of the primal and dual accuracies:
Constructing the joint setup.
When having a saddlepoint problem at hand, one usually begins with “partial” proximal setups for , and for , and has to construct a proximal setup on . Let us introduce the segment , where is the component of the center of . Folllowing [NN13], let us assume that both the dual radius and the “effective primal radius”
are known (note that the primal radius can be infinite but cannot). One can then construct a proximal setup
(13)  
Note that the corresponding “joint” proxmapping is reduced to the proxmappings for the primal and dual setups.
Composite Mirror Prox.
3 Algorithmic implementation
We now describe the efficient implementations of the proximal firstorder algorithms discussed in Sec. 2, as applied to the recovery procedures introduced in Sec. 1.
Change of variables.
Our first step is to pass to the Fourier domain, so that, first, convolution is represented as an efficiently computable linear operator, and second, the feasible set and the penalization term become quasiseparable in the new variables. Noting that the adjoint map of , cf. (5), is given by
consider the transformation
(14) 
Note that , and hence
where is defined by
(15) 
Problem reformulation.
After the change of variables (14), problems (ConLS) and (PenLS) take form (10):
(16) 
where is defined in (9). In particular, (ConLS) is obtained from (16) by setting and and (PenLS) is obtained by setting . Note that
On the other hand, problems (ConUF), (PenUF), and (ConLS) (after taking the square root of the objective in the last case) can be recast as saddlepoint problems (12) using that the dual norm of is with , whence
As a result, (ConUF), (PenUF) and (ConLS) are reduced to
(17) 
where for (ConUF) and (PenUF), and for (ConLS) after taking square root. Note that is bilinear, and
We are now in the position to apply the algorithms described in Sec. 2. One iteration of either of them is reduced to a constant number of computations of the gradient (which, in turn, is reduced to evaluating and ) and proxmappings. We now show how to evaluate convolutiontype operator and its adjoint in .
Evaluation of and .
Operator , cf. (15), can be evaluated in time via FFT. The key fact is that the convolution is contained in the first coordinates of the circular convolution of with a zeropadded filter
Via the DFT diagonalization property, this is expressed as
where operator on can be constructed in by FFT, and evaluated in . Let project to the first coordinates of ; note that its adjoint is the zeropadding operator which complements with trailing zeroes. Then,
(18) 
where all operators in the righthand side can be evaluated in . Moreover, we arrive at a similar representation for by formally taking the adjoint of (18).
3.1 Computation of proxmappings
The composite proxmappings corresponding to both basic proximal setups considered in Sec. 2 can be computed in . In the penalized case this computation can be done explicitly, and in the constrained case the proxmappings can be computed with a rootfinding algorithm. Below we describe these calculations since they are intersting for their own sake in the context of proximal algorithms applied to signal processing problems.
It suffices to consider partial proximal setups separately; the case of joint setup in saddlepoint problems can be treated using that the joint proxmapping is separable in and , cf. Sec. 2.3. Recall that the possible partial setups comprise the setup with and the (complex) setup with ; in both cases, is given by (7). Computing , cf. (11), amounts to solving
(19) 
in the constrained case, and
(20) 
in the penalized case^{1}^{1}1For the purpose of future reference, we also consider the case of squared norm penalty.; in both cases, . In the constrained case with setup, the task is reduced to the Euclidean projection onto the ball if , and onto the ball if ; the latter can be done (exactly) in via the algorithm from [DSSSC08] – for that, one first solves (19) for the complex phases corresponding to the pairs of components of . The constrained case with setup is reduced to the penalized case by passing to the Langrangian dual problem. Evaluation of the dual function amounts to solving a problem equivalent to (20) with , and (19) can be solved by a simple rootfinding procedure if one is able to solve (20). As for (20), below we show how to solve it explicitly when , and reduce it to onedimensional root search (so that it can be solved in to numerical tolerance) when . Indeed, (20) can be recast in terms of the complex variable :
(21) 
where and , cf. (7), whence
(22) 
with . Now, (21) can be minimized first with respect to the complex arguments, and then to the absolute values of the components of . Denoting a (unique) optimal solution of (21), the first minimization results in and it remains to compute the absolute values
Case .
The firstorder optimality condition implies
(23) 
Denoting , and using the softthresholding operator
we obtain the explicit solution:
In the case of setup this reduces to .
Case .
Instead of (23), we arrive at
(24) 
which we cannot solve explicitly. However, note that a counterpart of (24), in which is replaced with parameter , can be solved explicitly similarly to (23). Let denote the corresponding solution for a fixed , which can be obtained in time. Clearly, is a nondecreasing function on . Hence, (24) can be solved, up to numerical tolerance, by any onedimensional root search procedure, in evaluations of .
4 Theoretical analysis
We begin this section by recalling the worstcase bounds on the absolute accuracy of the objective,
(For saddlepoint problems, must be replaced with , and is upperbounded by the duality gap.) These bounds are applicable when solving an arbitrary optimization problem in one of two general classes described in Sec. 2 with the corresponding firstorder algorithm, cf. Theorems 4.14.2; they are expressed in terms of the “optimization” parameters that specify as the regularity of the objective and the radius of the feasible set. Our first theoretical contribution, cf. Proposition 4.1, is to express these “optimization” parameters, and hence the bounds on , in terms of some purely statistical quantities: the norm of exact estimator and the norm of the observations in Fourier domain.
4.1 Bounding the absolute accuracy
The following results are taken from [NN13], modulo notation adaptations.
Theorem 4.1.
Suppose that has Lipschitz gradient:
where is the dual norm to , and let be generated by iterations of Algorithm 1 with stepsize . Then,
Theorem 4.2.
First, note that in the case where the partial domain (for or ) is an norm ball, we will use the setup in that variable (then the domain coincides with , cf. (8), while if the domain is an norm ball, we can choose between or setups (in the latter case, the domain is contained in ). Radii can then be bounded as follows, cf. (8):
(25) 
where
(26) 
is the scaled norm of an optimal solution (note that ).
Another observation concerns the Lipschitz constants. Let
(27) 
depending on whether one uses the Euclidean setup () or the complex setup () in each variable ; besides, let . Introducing the complex counterpart of , operator given by
we can express the Lipshitz constants , in terms of the subordinate norms :
(28)  
The operator norm corresponds to the partial setups (in both variables), and the norm to the complex setups in both variables; note that . Now, itself can be bounded as follows:
Lemma 4.1.
One has
The proof of this lemma appears in the supplementary material. Together with (25), Lemma 4.1 results in the following
Proposition 4.1.
Note that Proposition 4.1 gives the same upper bound on the accuracy irrespectively of the proximal setup chosen among the ones described in the premise of (25). This is because we used the operator norm as an upper bound for and while these quantities can be as small as or even when one uses the “geometryadapted” proximal setup in which partial setups are used for the variables “measured” in norm. For a general linear operator on the gaps between and or can be as large as or , and hence one might expect Proposition 4.1 to be suboptimal for the “geometryadapted” setup. However, as can be seen from the representation (45), in our case operator has a special “almost diagonal” structure, and it is unlikely that its different subordinate norms scale differently with . This intuition can be made precise:
Proposition 4.2.
Assume that , and is periodic: Then, one has
4.2 Statistical accuracy and algorithmic complexity
In this section, we characterize the statistical accuracy of adaptive recovery procedures. It can be informally defined as the accuracy of minimizing the residual, sufficient for the corresponding approximate estimator to admit the same, up to a constant factor, theoretical risk bound as the exact estimator . We obtain the following two results.
Theorem 4.3.
While Theorem 4.3 controls the pointwise loss of uniformfit estimators, the next theorem controls the loss for leastsquares estimators. To state it, we recall that a linear subspace of is called shiftinvariant if it is an invariant subspace of the lag operator : on .