Splines are universal solutions of linear inverse problems with generalizedTV regularization
Abstract
Splines come in a variety of flavors that can be characterized in terms of some differential operator . The simplest piecewiseconstant model corresponds to the derivative operator. Likewise, one can extend the traditional notion of total variation by considering more general operators than the derivative. This results in the definition of a generalized total variation seminorm and of its corresponding native space, which is further identified as the direct sum of two Banach spaces. We then prove that the minimization of the generalized total variation (gTV), subject to some arbitrary (convex) consistency constraints on the linear measurements of the signal, admits nonuniform spline solutions with fewer knots than the number of measurements. This shows that nonuniform splines are universal solutions of continuousdomain linear inverse problems with LASSO, , or totalvariationlike regularization constraints. Remarkably, the type of spline is fully determined by the choice of and does not depend on the actual nature of the measurements.
Keywords. Sparsity, total variation, splines, inverse problems, compressed sensing
AMS subject classifications. 41A15, 47A52, 94A20, 46E27, 46N20, 47F05, 34A08, 26A33
1 Introduction
Imposing sparsity constraints is a powerful paradigm for solving illposed inverse problems and/or for reconstructing signals at superresolution [6]. This is usually achieved by formulating the task as an optimization problem that includes some form of regularization [49]. The concept is central to the theory of compressed sensing (CS) [9, 19] and is currently driving the development of a new generation of algorithms for the reconstruction of biomedical images [36]. The primary factors that have contributed to making sparsity a remarkably popular research topic during the past decade are as follows:

the increasing evidence of the superiority of the sparsitypromoting schemes over the classical linear reconstruction (including the Tikhonov regularization) in a variety of imaging modalities [36].
The approach developed in this paper is also driven by the idea of sparsity. However, it deviates from the standard paradigm because the recovery problem is formulated in the continuous domain under the practical constraint of a finite number of linear measurements. The illposedness of the problem is then dealt with by searching for a solution that is consistent with the measurements and that minimizes a generalized version of the totalvariation (TV) seminorm—the continuousdomain counterpart of regularization. Our major finding (Theorem 1) is that the extremal points of this kind of recovery problem are nonuniform splines whose type is matched to the regularization operator . The powerful aspect is that the result holds in full generality, as long as the problem remains convex. The only constraint is that the linear inverse problem should be wellposed over the (very small) null space of the regularization operator, which is the minimal requirement for any valid regularization scheme. In particular, Theorem 1 gives a theoretical explanation of the welldocumented observation that total variation regularization—the simplest case of the present theory with (derivative operator)—tends to produce piecewiseconstant solutions [11, 41]. Recognizing the intimate connection between linear inverse problems and splines is also helpful for discretization purposes because it provides us with a parametric representation of the solution that is controlled by the regularization operator . In that respect, our representer theorems extend some older results on spline interpolation with minimum norms, including the adaptive regression splines of Mammen and van de Geer [38] and the functional analytic characterization of Fisher and Jerome [27]. There is a connection as well with the work of Steidl et al. on splines and higherorder TV [48], although their formulation is strictly discrete and restricted to the denoising problem.
2 Linear Inverse Problems: Current Status and Motivation
Our notational convention is to use bold letters to denote ordinary vectors and matrices to distinguish them from their infinitedimensional counterparts; that is, functions (such as ) and linear operators (such as ). Simply stated, the inverse problem is to recover a signal from a finite set of linear measurements where is a disturbance term that is usually assumed to be small and independent of . In most realworld problem the unknown signal lives in the continuum so that it is appropriate to view it as an element of some Banach space . Then, by the assumption of linearity, there exists a set of functionals (the continuous dual of ) with such that the noisefree measurements are given by . The measurement functionals are governed by the underlying physics (forward model) and assumed to be known. Since the signal is an infinitedimensional entity and the number of measurements is finite, the inverse problem is obviously illposed, not to mention the fact that the true measurements are typically only approximate versions of since they are corrupted by noise.
2.1 FiniteDimensional Formulation
The standard approach for the resolution of such inverse problems is to select some finitedimensional reconstruction space . Based on the (simplifying) assumption that , one then converts the original noisefree forward model into the discretized version , where represents the expansion coefficients of in the basis . Here, is the socalled sensing matrix of size whose entries are given by .
The basic assumption made by the theory of compressed sensing is that there exists a finitedimensional basis (or dictionary) that “sparsifies” the class of desired signals with the property that for some fixed which is (much) smaller than ; in other words, it should be possible to synthesize the signal exactly by restricting the expansion to no more than atoms in the basis [20, 23, 40]. The signal recovery is then recast as the constrained optimization problem
(1) 
where the minimization of the norm promotes sparse solutions [49]. The role of the righthandside inequality is to encourage consistency between the noisy measurements and their noisefree restitution . The popularity of (1) stems from the fact that the theory of CS guarantees a faithful signal recovery from measurements under strict conditions on (i.e., restricted isometry) [7, 10, 19].
Instead of basing the recovery on the synthesis formula , one can adopt an alternative analysis or regularization point of view. To that end, one typically assumes that is discretized in some implicit “pixel” basis with expansion coefficients , where the are the samples of the underlying signal. The corresponding system matrix (forward model) is denoted by . Given some appropriate regularization operator , the idea then is to exploit the property that the transformed version of the signal, , is sparse. This translates into the optimization problem
(2) 
which is slightly more involved than (1). The two forms are equivalent only when and is invertible, the connection being . For computational purposes, (2) is often converted into the equivalent unconstrained version of the problem
(3) 
where is an adjustable regularization parameter that needs to tuned such that . One of the preferred choices for is the finitedifference operator—or the discrete version of the gradient in dimensions higher than one. This corresponds to the “totalvariation” reconstruction method, which is widely used in applications [3, 11, 30, 41].
The sparsitypromoting effect of these discrete formulations and the conditions under which the expansion coefficients of the signal can be recovered are fairly well understood [28, 54]. What is less satisfactory is the intrinsic interdependence between the sparsity constraints and the choice of the appropriate reconstruction space, which makes it difficult to deduce rates of convergence and error estimates relating to the underlying continuousdomain recovery problem.
2.2 InfiniteDimensional Formulation
Recently, Adcock and Hansen have addressed the above limitation by formulating an infinitedimensional theory of CS [1]. The measurements are the same as before, but the unknown signal is now a function . For the purpose of illustration, we take and with the property that such a function admits the (unique) expansion in the (properly normalized) Haar wavelet basis . It is known that the condition implies the inclusion of in weak—a space that is slightly larger than [12]. Conversely, one can force the inclusion in by imposing a bound on the norm of these coefficients. If one further assumes that the signal is sparse in the Haar basis, one can recast the reconstruction problem as
(4) 
which is the infinitedimensional counterpart of the synthesis formulation (1). The key question is to derive conditions on how to choose the to guarantee recovery of wavelet coefficients up to a certain scale. This has been done in [1, 2], which means that the issue of convergence is now reasonably well understood for the synthesis form of the recovery problem.
In our framework, is the space of functions on of bounded (total) variation, which is slightly larger than because it also includes constant signals. This allows us to close the circle by enforcing a regularization on the “true” total variation of the solution, which is associated with the derivative operator . This results in the functional optimization problem
(5) 
which is the continuousdomain counterpart of (2). Now, the motivation for our present theory is that (5) corresponds to a special case of Theorem 1 with and the closed compact convex set being specified by the inequality on the righthandside of (5); that is, . The key is that the differentiation operator is splineadmissible in the sense of Definition 1: Its causal Green’s function is the Heaviside (or unitstep) function whose rate of growth is , while its null space with is composed of all constantvalued signals. This implies that the extreme points of (5) necessarily take the form
(6) 
with . This corresponds to a piecewiseconstant signal with jumps of size at the , as illustrated in Figure 1. The solution also happens to be a polynomial spline of degree with knots at the with the property that , which is a weighted sum of shifted Dirac impulses (the innovation of the spline), as shown on the bottom of Figure 1.
In view of Definition 2, the solution (6) can also be described as a nonuniform spline with . The remarkable aspect of this result is that the parametric form (6) is universal, in the sense that it does not dependent on the measurement functionals . To the best of our knowledge, this is the first mathematical explanation of the wellknown observation that TV regularization tends to enforce piecewiseconstant solutions. The other interesting point is that one can interpret the solution as the best term representation of the signal within an infinitedimensional dictionary that consists of a constant signal plus a continuum of shifted Green’s functions (i.e., ), making the connection with the synthesis views (1) and (4) of the problem. Also, note that the described sparsifying effect is much more dramatic than that of the finitedimensional setting since one is collapsing a continuum (integral representation) into a discrete and finite sum.
We shall now show that the mechanism at play is very general and transposable to a much broader class of regularization operators and datafidelity terms, as well as for the multidimensional setting.
2.3 Road Map of the Paper
The remainder of this paper is organized as follows: After setting the notation, we present and discuss of our main representer theorem (Theorem 1) in Section 3. We also provide a refined version for the simpler interpolation scenario (Theorem 2). We then proceed with the review of primary applications in Section 4.
The mathematical tools for proving our results are developed in the second half of the paper. The first enabling component is the tight connection between splines and operators, which is the topic of Section 5. In particular, we present an operatorbased method to synthesize a spline from its innovation, which requires the construction of an appropriate rightinverse operator (Theorem 4). The existence of such inverse operators is fundamental to the characterization of the native spaces associated with our generalized totalvariation criterion (gTV) (Theorem 5), as we show in Section 6. The actual proof of Theorems 1 and 2 is given in Section 7. It relies on a preparatory result (generalized FisherJerome theorem) that establishes the impulsive form of the solutions of some abstract minimization problem over the space of bounded Borel measures.
We conclude the paper in Section 8 with a brief discussion of open issues.
3 Representer Theorems for Generalized Total Variation
Although we are considering a finite number of measurements, we are formulating the reconstruction problem in the continuous domain. This calls for a precise specification of the underlying functional setting.
3.1 Notation
The space of tempered distribution is denoted by where gives the number of dimensions. This space is made of continuous linear functionals acting on the Schwartz’ space of smooth and rapidly decaying test functions on [29, 33].
We shall primarily work with the space of regular, realvalued, countably additive Borel measures on , which is also known (by the RieszMarkov theorem) to be the continuous dual of : the Banach space of continuous functions on that vanish at infinity equipped with the supremum norm [42, Chap. 6]. Since is dense in , this allows us to define as
(7) 
and also to extend the space of test functions to . The action of will be denoted
by
where the righthand side stands for the Lebesgue integral of with respect to the underlying measure
Two key observations in relation to our goal are:

the compatibility of the and totalvariation norms with the former being stronger than the latter. Indeed, for all ;

the inclusion of Dirac impulses in , but not in . Specifically, for any fixed offset with for all .
We shall monitor the algebraic rate of growth/decay of (ordinary) functions of the variable by verifying their inclusion in the Banach space
(8) 
where
with . For instance, for .
A linear operator whose output is a function is represented with a roman capital letter (e.g., ). The action of on the signal is denoted by , or for short. Such an operator is said to be shiftinvariant if it commutes with the shift operator ; that is, if for any admissible signal and .
3.2 Main Result on the Optimality of Splines
Since the solution is regularized, the constrained minimization is performed over some native space that is tied to some admissible differential operator , such as , (second derivative), or (Laplacian) for .
Definition 1 (Splineadmissible operator).
A linear operator , where is an appropriate subspace of , is called splineadmissible if

it is shiftinvariant;

there exists a function of slow growth (the Green’s function of ) such that , where is the Dirac impulse. The rate of polynomial growth of is .

the (growthrestricted) null space of ,
has the finite dimension .
The native space of , , is then specified as
(9) 
It is largest function space for which the generalized total variation
is welldefined under the finitedimensional nullspace constraint
This also means that is only a seminorm on . However, it can be turned into a proper norm by factoring out the null space of . We rely on this property and the finite dimensionality of to prove that is a bona fide Banach space (see Theorem 5).
Having set the functional context, we now state our primary representer theorem.
Theorem 1 (gTV optimality of splines for linear inverse problems).
Let us assume that the following conditions are met:

The regularization operator is splineadmissible in the sense of Definition 1.

The linear measurement operator maps and is weak*continuous on .

The recovery problem is wellposed over the null space of : , for any .
Then, the extremal points of the general constrained minimization problem
(10) 
where is any (feasible) convex compact subset of , are necessarily nonuniform splines of the form
(11) 
with parameters , with , , and . Here, is a basis of and so that . The full solution set of (10) is the convex hull of those extremal points.
Theorem 1 is a powerful existence result that points towards the universality of nonuniform spline solutions. The key property here is , which follows from Conditions 13 in Definition 1 and is consistent with the more detailed characterization of splines presented in Section 5. For the time being, it suffices to remark that these splines are smooth (i.e., infinitely differentiable) everywhere, except at their knot locations .
Although the extremal problem is defined over a continuum, the remarkable outcome is that the problem admits solutions that are intrinsically sparse, with the level of sparsity being measured by the minimum number of required spline knots. In particular, this explains why the solution of a problem with a TV/type constraint on the derivative (resp., the second derivative) is piecewiseconstant (resp., piecewise linear when ) with breakpoints at . The other pleasing aspect is the direct connection between the functional concept of generalized TV and the norm of the expansion coefficients .
We observe that the solution is made up of two components: an adaptive one that is specified by and , and a linear regression term (with expansion coefficients ) that describes the component in the null space of the operator. Since does not contribute to , the optimization tends to maximize the contribution of the nullspace component. The main difficulty in finding the optimal solution is that and are problemdependent and unknown a priori.
We have mentioned in Section 2.2 that the seminorm yields the classical total variation of a function in 1D. Unfortunately, there is no such direct connection for , the reason being that the multidimensional gradient is not splineadmissible because it is a vector operator. Instead, as a proxy for the popular total variation of Rudin and Osher [41], we suggest using the (fractional) Laplacian seminorm with , which is endowed with the same invariance and nullspace properties. According to Theorem 1, such a thorder regularization results in extremal points that are nonuniform polyharmonic splines [21, 37].
3.3 Connection with Unconstrained Problem
The statement in Theorem 1 is remarkably general. In particular, it covers the generic regularized leastsquares problem
(12) 
which is commonly used to formulate linear inverse/compressedsensing problems [6, 9, 19, 23, 26]. The connection is obtained by taking , which is a ball of diameter centered on the measurement vector . Indeed, since the datafidelity term is (strictly) convex, the extreme points of (10) saturate the inequality such that and gTV is minimized with . In the unconstrained form (12), the selection of a fixed results in a particular value of the data error with the optimal solution having the same total variation as if we were looking at the primary problem (10) with .
To get further insights on the optimization problem (12), we can look at two limit cases. When , the solution must take the form so that . It then follows that . On the contrary, when , the minimization will force the data term to vanish. Theorem 1 then ensures the existence of a nonuniform “interpolating” spline with and minimum gTV seminorm.
3.4 Generalized Interpolation
In the latter interpolation scenario, the convex set reduces to the single point . This configuration is of special theoretical relevance because it enables us to refine our upper bound on the number of spline knots.
4 Application Areas
We first briefly comment on the admissibility conditions in Theorem 1 and indicate that the restrictions are minimal. To the best of our knowledge, the continuity of the measurement operator is a necessary requirement for the mathematical analysis of any inverse problem. The difficulty here is that our native space is nonreflexive, which forces us to rely on the weak*topology. The continuity requirement in Theorem 1 is therefore equivalent to for where the Banach structure of the predual space is laid out in Theorem 6. In particular, we refer to the norm inequality (25), which suggests that Condition 2 in Theorem 1 is met by picking where the latter is the Banach space associated with the weighted norm
(14) 
In fact, is the predual of the space defined by (8), which implies that . The condition is a mild algebraic decay requirement that turns out to be satisfied by the impulse response of most physical devices. As for the requirement that the inverse problem is well defined over the null space of (Condition 3), it a prerequisite to the success of any regularization scheme. Otherwise, there is simply no hope of turning an illposed problem into a wellposed one. For instance, in the introductory example with classical totalvariation regularization, the constraint is that should have at least one component such that which, again, is very mild requirement.
Next, we discuss examples of signal recovery that are covered by Theorems 1 and 2. The standard setting is that one is given a set of noisy measurements of an unknown signal and that one is trying to recover from based on the solution of (12), or some variant of the problem involving some other (convex) data term—the most favorable choice being the log likelihood of the measurement noise. We shall then close the discussion section by briefly making the connection with a class of inverse problems in measure space; that is, the case .
4.1 Interpolation
The task here is to reconstruct a continuousdomain signal from its (possibly noisy) nonuniform samples , which is achieved by searching for the function that fits the samples while minimizing .
This corresponds to the problem setting in Theorem 1 with and , where denotes the measurement vector. Hence, the admissibility condition is equivalent to , where the boundedness is ensured by the stability condition in Theorem 4. The more technical continuity requirement is achieved when is continuous (Hölder exponent ). This happens when the order of the differential operator is greater than one, which seems to exclude
4.2 Generalized Sampling
The setting is analogous to the previous one, except that the samples are now observed through a sampling aperture
so that [24, 50]. The function may, for example, correspond to the pointspread function of a microscope. Then, the recovery problem is equivalent to a deconvolution [18]. Since the measurements are obtained by integration of against an ordinary function , there is no
requirement for the continuity of because of the implicit smoothing effect of .
This means that essentially no restrictions apply.
4.3 Compressed Sensing
The result of Theorem 1 is highly relevant to compressed sensing, especially since the underlying /TV signalrecovery problem is formulated in the continuous domain. We like to view (11) as the prototypical form of a piecewisesmooth signal that is intrinsically sparse with sparsity . The model also conforms with the notion of a finite rate of innovation [56]. If we know that the unknown signal has such a form, then Theorem 1 suggests that we can attempt to recover it from an dimensional linear measurement by solving the optimization problem (10) with , which is in agreement with the predominant paradigm in the field. While the theorem states that , common sense dictates that we should take , where is the number of degrees of freedom of the underlying model. The difficulty, of course, is that a subset of those parameters (the spline knots ) induce a model dependency that is highly nonlinear.
4.4 Inverse Problems in the Space of Measures
Some of the theoretical results of this paper are also of direct relevance for inverse problems that are formulated in the space of measures [5]. The prototypical example is the recovery of the location (with superresolution precision) of a series of Dirac impulses from noisy measurements, which may be achieved through the continuousdomain minimization of the total variation of the underlying measure [8, 16, 22, 25]. The FisherJerome theorem [27, Theorem 1] as well as our extension for the unbounded domain and arbitrary convex sets (Theorem 7) support this kind of algorithm, as they guarantee the existence of sparse solutions—understood as a sum of Dirac spikes—for this family of problems.
5 Splines and Operators
We now switch to the explanatory part of the presentation. The first important concept that is implicit in the statement of Theorems 1 and 2 is the powerful association between splines and operators, the idea being that the selection of an admissible operator specifies a corresponding type of splines [46, 47][55, Chapter 6].
Sorted by increasing complexity, the three types of operators that are of relevance to us are: (i) ordinary differential operators, which are polynomials of the derivative operator [13, 46, 52]; (ii) partial differential operators such as the Laplacian (or some polynomial thereof); and (iii) fractional derivatives such as or with whose Fourier symbols are and , respectively [21, 51, 53]. It can be shown that all linearshiftinvariant operators of Type (i) and all elliptic operators of Type (ii) are splineadmissible in the sense of Definition 1. This is also known to be true for fractional derivatives and fractional Laplacians with [21, 51].
Let us mention that the issue of making sure that the null space of the operator is finitedimensional is often nontrivial for . It is a fundamental aspect that is addressed in the theory of radial basis functions and polyharmonic splines with the definition of the appropriate native spaces [58, Chapter 10]. Here, we have chosen to bypass some of these technicalities by including a growth restriction (i.e., the condition that ) in the definition of . A fundamental property in that respect is that the finitedimensional null space of a LSI operator can only include exponential polynomial components of the form , which correspond to a zero of multiplicity at least of the frequency response at (see [55, Proposition 6.1 p. 118] and [32, Section 6]).
Once it is established that is splineadmissible, one can rely on the following unifying distributional definition of a spline.
Definition 2 (Nonuniform spline).
A function of slow growth (i.e., with ) is said to be a nonuniform spline if
(15) 
where is a sequence of weights and the Dirac impulses are located at the spline knots .
The generalized function is called the innovation of the spline because it contains the crucial information for its description: the positions of the knots and the amplitudes of the corresponding discontinuities.
The onedimensional brands of greatest practical interest are the polynomial splines with [15, 45] and the exponential splines [13, 46, 52] with , where denotes the identity operator. Their multidimensional counterparts are the polyharmonic splines with [21, 37] and the Sobolev splines with for [57]. The connection with the theory of Sobolev spaces is that the Green’s functions of (resp., are the kernels of the Riesz (resp., Bessel) potentials [31].
For a constructive use of Definition 2, we also need to be able to resynthesize the spline from its innovation. In the case of our introductory example with (see Figure 1), one simply integrates , which yields (6) (up to the integration constant ) owing to the property that . In principle, the same inversion procedure is applicable for the generic operator and amounts to substituting the distribution in (15) by the Green’s function . The only delicate part is the proper handling of the “integration constants” (the part of the solution that lies in the null space of the operator), which is achieved through the specification of linear boundary conditions of the form .
We now show that the underlying functionals can be incorporated in the specification of an appropriate rightinverse operator . Our construction requires that first be matched to a basis of such as to form a biorthogonal system. We note that this is always feasible as long as the are linearly independent with respect to . (An explicit construction is given in the proof of Theorem 2.)
Definition 3.
The pair is called a biorthogonal system for if is a basis of and the vector of “boundary” functionals with satisfy the biorthogonality condition where is the th element of the canonical basis.
The interest of such a system is that any has a unique representation as with associated norm .
The fundamental requirement for our formulation is the stability/continuity of the inverse operator . Since by construction, we can control stability by relying on Theorem 3 whose proof is given in Appendix A.
Theorem 3.
The generic linear operator continuously maps with if and only if its kernel is measurable and
(16) 
This allows us to characterize the desired operator in term of its Schwartz’ kernel (or generalized impulse response) .
Theorem 4 (Stable rightinverse of ).
Since the choice of the linear boundary functionals is essentially arbitrary, there is flexibility in defining admissible inverse operators. The important ingredient for our formulation is the existence of such inverses with the unconditional guarantee of their stability (see Theorem 5 below).
To put this result into context, we now provide some illustrative examples. For , we have that , , and for , where the polynomial basis is biorthogononal to with . This (canonical) choice of boundary functionals then translates into the construction of an inverse operator that imposes the vanishing of the function and its derivatives at the origin. By applying (19) and recognizing the binomial expansion of , we simplify the expression of the kernel of this operator as
The crucial observation here is that the function with fixed is compactly supported and bounded. Moreover, so that obviously satisfies the stability bound (16) with . By contrast, the condition fails for the conventional shiftinvariant inverse (fold integrator), which stresses out the nontrivial stabilizing effect of the second correction term in (19). The other important consequence of the correction is the vanishing of as for any fixed , contrary to its leading term which does not decay (and even grows) as .
The primary usage of the inverse operators of Theorem 4 is the resolution of differential equations of the form
(20) 
for some . Indeed, by invoking the properties of and the biorthogonality of , we readily show that (20) admits a unique solution in , which is given by
For the particular case of the spline innovation in Definition 2, we find that
which, upon substitution of the kernel given by (19), results in a form that is the same as (11) in Theorem 1 modulo some adjustment of the constants .
6 Native or Generalized BeppoLevi Spaces
The search for the solution of our optimization problem is performed over the native space defined by
(9), which is the largest space over which our gTG regularization functional is well defined.
The delicate aspect is that
is specified in terms of a seminorm, in analogy with the definition of the classical BeppoLevi spaces of order and exponent , written as
[17, 34]. Hence, in 1D, the proposed definition of is a slight extension of . In higher dimensions, it can be shown
The crucial point for our formulation is that also happens to be a complete normed (or Banach) space when equipped with the proper directsum topology. We shall now make this structure explicit with the help of the inverse operators defined in Theorem 4.
Since the principle is similar
to the characterization of the BeppoLevi spaces, we shall also refer to as a generalized BeppoLevi space.
Theorem 5 (Banachspace structure of native space).
Let be a splineadmissible operator, its native space defined by (9), and some biorthogonal system for its null space . Then, the following equivalent conditions hold:

The rightinverse operator specified by Theorem 4 isometrically maps , while its kernel necessarily fullfills the stability condition
(21) 
Every admits a unique representation as
where and .

is a Banach space equipped with the norm
(22)
Proof.
As preparation, we define a subset of as
(23) 
Since the boundary conditions are linear, is clearly a vector space. We now show that it is a Banach space when equipped with the norm