Sparsity of solutions for variational inverse problems with finitedimensional data
Abstract
In this paper we characterize sparse solutions for variational problems of the form , where is a locally convex space, is a linear continuous operator that maps into a finite dimensional Hilbert space and is a seminorm. More precisely, we prove that there exists a minimizer that is “sparse” in the sense that it is represented as a linear combination of the extremal points of the unit ball associated with the regularizer (possibly translated by an element in the null space of ). We apply this result to relevant regularizers such as the total variation seminorm and the Radon norm of a scalar linear differential operator. In the first example, we provide a theoretical justification of the socalled staircase effect and in the second one, we recover the result in [unsersplines] under weaker hypotheses.
1 Introduction
One of the fundamental tasks of inverse problems is to reconstruct data from a small number of usually noisy observations. This is of capital importance in a huge variety of fields in science and engineering, where typically one has access only to a fixed and small number of measurements of the sought unknown. However, in general, this type of problem is underdetermined and therefore, the recovery of the true data is practically impossible. One common way to obtain a wellposed problem is to make a priori assumptions on the unknown and, more precisely, to require that the latter is sparse in a certain sense. In this case, the initial data can often be recovered by solving a minimization problem with a suitable regularizer of the form
(1) 
where is the regularizer, , finitedimensional Hilbert space, models the finite number of observations (that is small compared to the dimension of ) and is noisefree data.
When the domain is finitedimensional and the regularizer is the norm, the problem falls into the established theory of compressed sensing [compressedsensing, donohocompressed] that has seen a huge development in recent years. In this case, sparsity is intended as a high number of zeros coefficients with respect to a certain basis of .
In an infinite dimensional setting, when the domain is usually a Banach space, there has been a clear evidence that the action of the regularizers is promoting different notions of sparsity, but there have not been a comprehensive theory explaining this effect.
Nevertheless, the effect of sparsity plays a crucial role in the field of image processing and computer vision: in many cases, the recovered image in a variational model can be interpreted as sparse with respect to a notion of sparsity that is depending on the regularizer. For example, for classical total variation (TV) denoising [tvdenoising]
(2) 
it has been observed that minimizers are characterized by the so called staircase effect (see for example [staircase, esedoglusparse, Nikolova]) which corresponds to the gradient of the considered image having small support. Another classical example of sparsitypromoting regularizers is penalization. In [sparsetik] the authors study the regularizer with in Hilbert spaces and they note that the case promotes sparsity with respect to a given basis of the Hilbert space which means that only a finite number of coefficients in the respective basis representation is nonzero. In [resmerita], regularization is used in the framework of the least error method to recover a sparse solution with a fixed bound on the number of nonzero coefficients. Finally, it has been noted that suitable type regularizers enforce sparsity when data are represented in a wavelet basis (see for example [antoniadis, donoho]).
The intrinsic sparsity of infinitedimensional variational models with finitedimensional data has been investigated by various authors in specific cases and in different contexts. One of the most important instances can be found in [chandra]: here, the authors notice that the regularizer is linked to the convex hull of the set of sparse vectors that we aim to recover. This was also noticed in optimal control theory (see, for example, [casaskunisch]) and used in practice for developing efficient algorithms to solve optimization problems that are based on the sparsity of the minimizers [bredieslorenz, pikka, walter].
More recently, several authors have investigated deeply the connection between regularizers and sparsity. In 2016, Unser, Fageot and Ward in [unsersplines] have studied the case where , is a scalar linear differential operator and denotes the Radon norm. They showed the existence of a sparse solution, namely a linear combination of counterimages of Dirac deltas which can be expressed using a fundamental solution of . Also, the work of Flinth and Weiss [exactsolutionsflinth] is worth mentioning, where they give an alternative proof of the result in [unsersplines] with less restrictive hypotheses. In both works, however, the case of a vectorvalued differential operator was not treated and therefore, problems involving the total variation regularizer were not covered. After this manuscript was finalized, we discovered a recent preprint [chambollerepresenter] where the authors study a similar abstract problem and apply it, in particular, to the TV regularizer in order to justify the staircase effect. We remark that [chambollerepresenter] and the present paper were developed independently and differ in terms of the proofs as well as the applications.
In this paper, we provide a theory that characterizes sparsity for minimizers of general linear inverse problems with finitedimensional data constraints. More precisely, we choose to work with locally convex spaces in order to deal, in particular, with weak* topologies. The latter is necessary in order to treat variational problems with TV regularization or Radonnorm regularization. We consider the following problem:
(3) 
where is a locally convex space, is a lower semicontinuous seminorm, is a linear continuous map with values in a finitedimensional Hilbert space and is a proper, convex, lower semicontinuous functional. (Notice that this generality allows problems of the type (1) for noisefree data as well as soft constraints in case of noisy data.) Additionally we ask that (see Assumption [H0] below) and that is coercive when restricted to the quotient space of with the nullspace of that we denote by (see Assumption [H1] below). Under these hypotheses we prove that there exists a sparse minimizer of (3), namely a minimizer that can be written as a linear combination of extremal points of the unit ball associated to (in the quotient space ). More precisely, we obtain the following result:
Theorem (Theorem LABEL:maint).
Notice that our result completely characterizes the sparse solution of (3) and relates the notion of sparsity with structural properties of the regularizer . Moreover, our hypotheses are minimal for having a wellposed variational problem (3).
The strategy to prove the previous theorem relies on the application of Krein–Milman’s theorem and Carathéodory’s theorem in the quotient space of that allows to represent any element in the image by of the unit ball of the regularizer as a convex combination of the extremal points (see Theorem LABEL:maint). In order to prove minimality for the element having the desired representation, we derive optimality conditions for Problem (3) (Proposition LABEL:opt). For this purpose, we need to prove a no gap property in the quotient space between primal and dual problem. In locally convex vector spaces this is not straightforward and requires the notion of Mackey topology [Schaefer].
In the second part of our paper we apply the main result to specific examples of popular regularizers. First of all we recover the wellknown result (see for example [shapiro]) that minimizing the Radon norm of a measure under a finitedimensional data constraints one recovers a minimizer that is made of delta peaks. Indeed, according to our theory which applies when the space of Radon measures is equipped with the weak* topology, Dirac deltas are extremal points of the unit ball associated with the Radon norm of a measure and our result applies straightforwardly (see Section LABEL:radonnormmeas).
Then, we consider the TV regularizer for BV functions in bounded domains. Also in this case, our result applies when is equipped with the weak* topology. This justifies the usage of locally convex space in the general theory. In order to confirm the heuristic observation that sparse minimizers show a peculiar staircase effect, we characterize the extremal points of the unit ball associated to the TV norm (in the quotient space ). In particular, we extend a result of [Ambrosiocasellesconnected] and [Flemingext] to the case where is a bounded domain. In order to achieve that, we need an alternative notion of simple sets of finite perimeter (see Definition LABEL:simpleset). We prove the following theorem:
Theorem (Theorem LABEL:sparsity_tv).
If and there exists a minimizer of (3) such that
(5) 
where , , with and are simple sets with finite perimeter in .
Finally, we apply our main result to the setting considered in [unsersplines] and [exactsolutionsflinth], i.e., where the regularizer is given by for a scalar linear differential operator . We remove the hypotheses concerning the structure of the nullspace of and we work in the space of finiteorder distributions equipped with the weak* topology. This allows us to have a general framework for these inverse problems that does not require additional assumptions on the Banach structure of the minimization domain (see [chambollerepresenter] and [exactsolutionsflinth] for comparison). It also justifies once more the usage of locally convex spaces in the abstract theory. In this setting, as an application of our main theorem, we are able to recover the same result as in [unsersplines] and [exactsolutionsflinth].
Theorem (Theorem LABEL:sparsity_diffop).
Let (for sufficiently large, depending only on and ) and . Then, there exists a minimizer of (3) such that
(6) 
where , , , with (we denote by the fundamental solution of obtained by the Malgrange–Ehrenpreis theorem translated by ).
2 Setting and preliminary results
2.1 Basic assumptions on the functionals
Let be a real locally convex space, i.e., the topology is generated by a separating family of seminorms, and its topological dual equipped with the weak* topology. Further, let be an dimensional real Hilbert space and a linear continuous operator and we denote by its continuous adjoint, defined thanks to Riesz’s theorem as
for every and . Notice that we have denoted by both the scalar product in the Hilbert space the duality product between and .
As anticipated in the introduction we deal with a variational problem of the type
(7) 
In the remaining part of this section we describe the assumptions on and separately.
 Assumptions on :
We consider
a proper convex function that is coercive, and lower semicontinuous with respect to the topology of , which is the standard topology on finitedimensional spaces.
 Assumptions on :
We consider
a seminorm and that is lower semicontinuous with respect to the topology of . We make the following additional assumption:

,
where denotes the domain of , i.e.
Defining the nullspace of as , which is a closed subspace of , we consider the following quotient space:
(8) 
endowed with the quotient topology. It is wellknown that is a locally convex space [Rudin]. We call the canonical projection onto the quotient space and for simplifying the notation, given we denote by the image of in the quotient space by . Likewise, for , we tacitly identify the Minkowski sum with its image under in .
Define then as
(9) 
It is easy to see that is welldefined and it is a seminorm in .
We assume that

is coercive, i.e. the sublevel sets
are compact for every .
Remark 2.1.
Note that is lower semicontinuous in : Indeed, as is lower semicontinuous, the superlevelsets are open in for each . Now, as if and only if , we have . Since is an open map ( is a topological group with respect to addition), each is open in meaning that is lower semicontinuous.
As a consequence, in order to obtain [H1], it suffices that each is contained in a compact set.
From now on we assume that , and satisfy the properties described above.
2.2 Existence of minimizers
We state the following minimization problem:
Problem 2.2 (Minimization problem in ).
Given , and with the assumptions given in the previous section, define for the following functional:
(10) 
We aim at solving
In order to prove the existence of minimizers for Problem 2.2 we state an auxiliary minimization problem in the quotient space .
Problem 2.3 (Minimization problem in ).
Given , and with the assumptions given in the previous section, we define
(11) 
We want to solve
It is easy to check that the functional is welldefined in . We aim at proving existence of minimizers for Problem 2.3. For this reason we firstly prove a lemma about the coercivity of functionals defined in quotient spaces.
Lemma 2.4.
Let be a locally convex space and be coercive. Given a closed subspace of , we define, on the space as
Then, is coercive with respect to the quotient topology of .
Proof.
By coercivity, the sublevel sets are compact for each . Since the projection is continuous, each is compact in . Now,
Since, by definition, if and only if for each there exists such that , the identity follows. The righthand side is compact as an intersection of compact sets, hence each is compact, showing the coercivity of . ∎
Proposition 2.5.
There exists a minimizer for Problem 2.3.
Proof.
As is proper, using Hypothesis [H0] we infer that the infimum of Problem 2.3 is not . Likewise, since is convex, lower semicontinuous and coercive, it is bounded from below such that the infimum of Problem 2.3 is also not . Let us show that the proper and convex function is lower semicontinuous in . For that purpose, observe that is a subspace of the finitedimensional space and hence closed. Denote by the quotient space on which we define according to
where . Note that this functional is proper and convex. As is assumed to be coercive, applying Lemma 2.4 yields that is also coercive and lower semicontinuous in particular. Now,
where the righthand side is a composition of continuous linear maps and a lower semicontinuous functional and hence, lower semicontinuous. Obviously, replacing by , does not change the value of this functional, so by the same argument as in Remark 2.1, we deduce that and consequently , is lower semicontinuous.
Notice now that
Therefore, as is lower semicontinuous and is coercive due to Hypothesis [H1], we infer that is compact for every .
We want to prove that admits a minimizer in . Notice that the collection has the finite intersection property. As the set is compact for and each is closed, we infer that
Choosing we notice that it is a minimizer of as
We are now in position to prove the existence of minimizers for Problem 2.2.
Theorem 2.6.
Proof.
Notice that for every we have
(12) 
Hence taking the infimum with respect to on both sides we obtain that Problem 2.2 and Problem 2.3 have the same infimum. Let be a minimizer for Problem 2.3. Then consider the following minimization problem:
As is proper, convex, lower semicontinuous and coercive as well as is finitedimensional and hence closed in , the infimum is realized and finite. Denoting by a minimizer, we choose such that . Then, is a minimizer for Problem 2.2. Indeed,
Then, as the two minimization problems have equal infimum, we conclude. ∎
2.3 Optimality conditions
In this section we want to obtain optimality conditions for Problem 2.3 deriving a dual formulation and showing that under our hypotheses we have no gap between the primal and the dual problem.
In order to perform this analysis we need to endow the space with the Mackey topology. For the reader’s convenience we remind the definition of the Mackey topology and we refer to [Schaefer] for a comprehensive treatment. Given a real locally convex space , define the following family of seminorms on :
(13) 
for every absolutely convex and weakly compact. This family of seminorms generates a locally convex topology on that is called Mackey topology and it is denoted by . It is the strongest topology on such that is still the dual of .
Further, we need the notion of Fenchel conjugate functionals which are defined as follows. Given a real locally convex space and a proper function we denote by the conjugate of defined as
In order to obtain the optimality conditions we will use the following wellknown proposition (see for example [Aubin]).
Proposition 2.7.
Let be a real locally convex space. Given a proper, lower semicontinuous, convex function , the following statements are equivalent:

is continuous in zero for the Mackey topology .

for every , the sublevelset
is compact with respect to the weak topology.
Remark 2.8.
In the next proposition, we will apply this result for , a proper and lower semicontinuous seminorm. In this case, the proof of Proposition 2.7 is straightforward. Indeed, where is the indicator function and . Hence, if is weakly compact, then thanks to the definition of the Mackey topology, is continuous in zero.
Conversely, if is continuous in zero, then there exist absolutely convex, weakly compact sets and such that for implies . This, however, means that . Indeed, if this were not the case, one could separate a from the absolutely convex and weakly compact set by a such that as well as for . In particular, for each leading to the contradiction . Due to lower semicontinuity of , is a closed subset of a weakly compact set and hence compact. By positive homogeneity of , the sets are compact for all .
For the following, it is convenient to define the linear operator as
(14) 
Remark 2.9.
Notice that is continuous in . Indeed, the following diagram commutes: