Abstract.
In this work, we introduce a function space setting for a wide class of structural/weighted total variation (TV) regularization methods motivated by their applications in inverse problems. In particular, we consider a regularizer that is the appropriate lower semicontinuous envelope (relaxation) of a suitable total variation type functional initially defined for sufficiently smooth functions. We study examples where this relaxation can be expressed explicitly, and we also provide refinements for weighted total variation for a wide range of weights. Since an integral characterization of the relaxation in function space is, in general, not always available, we show that, for a rather general linear inverse problems setting, instead of the classical Tikhonov regularization problem, one can equivalently solve a saddlepoint problem where no a priori knowledge of an explicit formulation of the structural TV functional is needed. In particular, motivated by concrete applications, we deduce corresponding results for linear inverse problems with norm and Poisson loglikelihood data discrepancy terms. Finally, we provide proofofconcept numerical examples where we solve the saddlepoint problem for weighted TV denoising as well as for MR guided PET image reconstruction.
A function space framework for structural total variation regularization with applications in inverse problems
Michael Hintermüller^{†‡}, Martin Holler ^{\mathsection} and Kostas Papafitsoros ^{†} ^{†}^{†} ^{†}Weierstrass Institute for Applied Analysis and Stochastics (WIAS), Mohrenstrasse 39, 10117, Berlin, Germany ^{†}^{†} ^{‡}Institute for Mathematics, Humboldt University of Berlin, Unter den Linden 6, 10099, Berlin, Germany ^{†}^{†} ^{\mathsection}Institute for Mathematics and Scientific Computing, University of Graz, Heinrichstrasse 36, A8010, Graz, Austria. The institute is a member of NAWI Graz (www.nawigraz.at). M. H. is a member of BioTechMed Graz (www.biotechmed.at).
^{†}^{†} Emails: Michael.Hintermueller@wiasberlin.de^{†}^{†} Martin.Holler@unigraz.at ^{†}^{†} Kostas.Papafitsoros@wiasberlin.de1. Introduction
In many classical applications of inverse problems, rather than measuring data from a single channel, recently the simultaneous acquisition from multiple channels has gained importance. Besides having different sources of information available, a main advantage of such multiple measurements is due to the possibility of exploiting correlations between different data channels in the inversion process, often leading to a significant improvement for each individual channel. In particular, when the underlying quantities of interest can be visualized as image data, these correlations typically correspond to joint structures in images. In view of this background, multimodality and multicontrast imaging explore such joint structures for improved reconstruction. Successful applications of these techniques can be found for instance in biomedical imaging [9, 20, 21, 22, 40, 43, 45, 49, 52], geosciences [51], electron microscopy [27] and several more.
In this context, one distinguishes two different approaches for exploiting correlations: (i) Joint reconstruction techniques that treat all available channels equally, such as [40], and (ii) structuralpriorbased regularization techniques that assume some ground truth structural information to be available. Here we focus on a particular class of type (ii), namely structural totalvariationtype regularization functionals, i.e., functionals which integrate a spatiallydependent pointwise function of the image gradient for regularization. In this vein, given some prior information , we consider a regularization functional of the type
which is used in a Tikhonovtype regularization problem where is reconstructed by solving
(1.1) 
Here, denotes a data discrepancy term and a bounded linear operator which reflects the forward model. A very basic, formal example of such a regularization is given when denotes the underlying ground truth image and we incorporate information on its gradient by defining , where is some norm. This yields
which corresponds to a weighted total variation (TV) functional.
Even though problems of the form (1.1) limit the exchange of information to the gradient level, they cover a large set of existing works in particular in the context of medical image processing. This can be explained on the one hand by the popularity of TVtype regularization for imaging in general, which is due to its capability of recovering jump discontinuities. On the other hand, the gradient level seems very well suited for exploiting correlations between different contrasts or modalities as it primarily encodes structural information, i.e., to some extent it is independent of the absolute magnitude of the signal. With the goal of enforcing parallel level sets, in [19], for instance, the regularizer was chosen for a given image as
(1.2) 
where are appropriate increasing functions, denote smoothed norms, and is the scalar (“dot”) product of vectors . This definition is motivated by the fact that for two vectors the quantity is zero if and only if they are parallel to each other. Particular instances of (1.2) were then also used in [22] for joint MRPET reconstruction. A similar functional, but in the spirit of [39] is
(1.3) 
Here denotes some a priori vector field that contains gradient information. In the context of multicontrast MR, the authors in [20] choose
(1.4) 
where again a smoothed version of the absolute value was used in the denominator. Here, denotes the angle between and . Observe that the latter functional can also be written as
bearing similarity to the functional (1.3).
We note that the regularization approaches above have been considered only in discrete settings, despite the original continuous formulations. Concerning the latter and as indicated above in connection with (1.1), it is natural to consider , i.e., is Lebesgue integrable and is of bounded total variation, still allowing for jump discontinuities, i.e., sharp edges. As a consequence, the (generalized) gradient of is in general a finite Radon measure, only. This fact, however, challenges the proper defininition of regularization functionals of the above types in an associated function space setting. The first part of our work aims at addressing precisely this issue. In fact, resorting to the concept of functions of a measure [7] we propose to use the convex biconjugate as a regularizer. Indeed, starting from a general function , with minimal assumptions, e.g., convexity, linear growth, homogeneity in the second variable, we define the functional as
(1.5) 
where we omit the dependence of on the prior for the sake of unburdening notation. Here, and denote the usual Lebesgue and Sobolev spaces; see [1]. We note that in (1.5) is not suitable for variational regularization as it is finite only for a class of rather regular functions (i.e., in ) and it is not lower semicontinuous in an appropriate topology. As a remedy, we propose to resort to the convex biconjugate of , which we call structural TV functional. We recall that coincides with the lower semicontinuous envelope (relaxation) of with respect to convergence. In general, may not have an explicit representation, but we show that it can always be expressed in a dual form as
(1.6) 
where
(1.7) 
see (3.2) below for the precise definition of the support function , and we refer to [25] for details on . Nevertheless, based on a recent result by Amar, De Cicco and Fusco [2], by linking to the relaxed functional of with respect to convergence we are able to provide an integral representation, under additional assumptions. For instance, in the case where , , with , is equal to the weighted TVfunctional
with the approximate lower limit of ; see (2.3) below for its definition. The set then reads
Interestingly, as a consequence of this formulation, we get certain density results of convex intersections in the spirit of [31, 32, 34] as a byproduct; compare Section 4.
Taking advantage of duality theory, in the second part of the paper we use the structural TV functional for the regularization of linear inverse problems. In particular we study the general minimization problem
(1.8) 
As emphasized above, an explicit representation of the functional is available only under some additional, perhaps restrictive assumptions. In order to solve (1.8) without invoking such assumptions, we employ FenchelRockafellar duality and show, in the continuous setting, equivalence of (1.8) to a saddlepoint problem of the form
(1.9) 
where denotes an appropriate pairing. This is achieved under assumptions which are tight in the sense that we can provide a counterexample where, without these assumptions, even existence of a solution for (1.8) fails. The major advantage of the above saddlepoint reformulation is that it allows to obtain a solution of the original problem without requiring an explicit form of neither nor . Note that the latter would be required for solving the predual problem. Furthermore, it has a format directly amenable to dualitybased numerical optimization algorithms.
The equivalence to a saddlepoint reformulation is obtained under rather general assumptions on the data discrepancy term , which, as corollary, allows us to cover the case of any norm discrepancy term as well as the case of a loglikelihood term for Poisson noise, which is relevant for instance for PET image reconstruction and electron tomography. The latter leads to the minimization problem
(1.10) 
where denotes a Radontransformtype operator, some given data, and is the indicator function for a set , i.e., if and otherwise.
Finally, we show the versatility of our approach with proofofconcept numerical examples in weighted TV denoising with vanishing weight function and MR guided PET reconstruction.
We note here that there is previous work on the analysis of weighted and/or structural TV regularization in an infinite dimension setting. In [26], another instance of structuralTV type functionals is employed, but the work only considers the case of image denoising. Further, the authors simultaneously optimize over the image data and an anisotropy in the TVterm, which leads to a nonconvex problem. Regarding properties of solutions of weighted TV denoising we refer to the work by Jalalzai [38] as well as to [30]. Finally, we mention that in [5] the authors analyze a weighted TV regularization model for vortex density models.
Structure of the paper
The paper is organized as follows: In Section 2, we fix our notation and we remind the reader of basic facts concerning functions of bounded variation and spaces.
In Section 3 we describe the relaxation framework for the structural TV functional. Under some conditions, we provide an integral representation of the relaxation based on a result of [2] and we also provide a characterization of its subdifferential.
Some refinements for weighted TV functional are given in Section 4 in the case of continuous and lower semicontinuous weight functions. We show that the functional can be defined in a dual fashion, using smooth test functions and as a byproduct we obtain certain density results of convex intersections in the spirit of [32, 34].
Section 5 contains the main duality result of the paper. In particular, we show that under certain mild assumptions, the variational regularization problem with the relaxed structural TV functional as regularizer can be equivalently formulated as a saddlepoint problem which requires knowledge neither of the explicit form of the relaxation–as it is the case for the primal problem –nor of the convex conjugate of the discrepancy term–as it is the case for the predual problem. As particular application, we elaborate on the case of Poisson loglikelihood discrepancy terms and show how the result can be transferred to this situation.
Finally, in Section 6 we provide proofofconcept numerical examples, where we solve our saddle point problem for weighted TV denoising with vanishing weight function and for MRguided PET image reconstruction.
2. Notation and Preliminaries
2.1. Functions of bounded variation
Throughout the paper, , with , will be a bounded, open set with Lipschitz boundary, and we denote . By we always denote two real numbers such that and if , if , and if .
The space of functions of bounded variation on is denoted by . We have that if and only if it is in and its distributional derivative is a bounded Radon measure, denoted by . The total variation of is defined to be the total variation of that measure, i.e., and it is equal to
(2.1) 
The measure can be decomposed into
where is the absolutely continuous with respect to Lebesgue measure , with density function denoted by , denotes the jump part which is the restriction to the jump set of , and is the Cantor part of . We recall that is defined over the set of points in for which that where
(2.2)  
(2.3) 
are the approximate upper and lower limits of , respectively. With these definitions, the total variation of the measure can be written as
where denotes the dimensional Hausdorff measure.
The density functions of the measures and with respect to and are denoted by and , respectively.
For further details about the space , we refer the reader to [3, 24].
2.2. The space
As the Banach space , with , plays a major role in our work, we recall here some basic facts.
Definition 2.1 ().
Let and . We have if there exists such that for all
Furthermore we define
with the norm
Remark 2.2.
By density of in we have as is unique. By completeness of and it follows that is a Banach space when equipped with .
We now state some general properties of . As is just a straightforward generalization of the wellknown space , these results can be proven readily by generalizing from ; see [25, Chapter 1] for details on the latter.
Proposition 2.3 (Density).
Let . Then is dense in with respect to .
Proposition 2.4 (Normal trace).
Let and denote by the outer normal vector to at . Then the mapping
with for , can be extended to a linear, continuous mapping, also denoted by . Further we have a GaussGreen formula for functions:
Definition 2.5.
For , we define
The next proposition also uses ideas from [25] but since its proof is slightly more involved, we include it in Appendix A.
Proposition 2.6.
For we have , with the trace operator as in Proposition 2.4 and is understood in the sense of .
From this fact, another equivalent characterization of can be obtained readily.
Corollary 2.7.
For and we have if and only if
(2.4) 
3. Structural TV as a lower semicontinuous envelope
The goal of this section is to obtain a predual representation of a general TVbased functional that includes the case of weighted TV for a general choice of weights. To this aim, we will define the corresponding functional as the lower semicontinuous envelope of a restriction to functions, with . The approach is motivated by the paper of Bouchitte and Dal Maso [12]. We start with a few definitions.
By we always denote a function satisfying the following conditions:

For a.e. , is convex and positively 1homogeneous on .

There exists such that

For every , , i.e., is an even function in the second variable.
Furthermore, for we define as
(3.1) 
Remark 3.1.
We note that with any bounded above satisfies the above assumptions (J1)–(J3).
In what follows, convex conjugation of will always be understood with respect to the second argument, and convex conjugation of is performed in . Due to positive 1homogeneity, we get the following well known representation of . For its formulation, we define the support function
(3.2) 
and denote by the convex conjugate of ; see [23] for more information on the latter concept.
Proposition 3.2.
Let . Then, for any we have , with the convex indicator function of the set .
Proof.
In case the assertion holds true trivially. Hence we assume that there exists with . For such a point , we can write
with and for such that . Hence we get
where the nonnegativity follows from the fact that is even. We further have
which completes the proof.
∎
Remark 3.3.
From the proposition above if follows for that
and hence .
We note that by definition and density we have for with that
Here we write for and . The next proposition follows the lines of [12] and provides a characterization , which also holds without the positive 1homogeneity assumption on .
Proposition 3.4.
Let . Then we have for all that
where , and we set .
Proof.
We define the map with . Due to (J2), is well defined. Then, by [23, Theorem X.2.1], we get for that
Further, we define the unbounded operator by and . Then is densely defined and closed. We note that can be rewritten as
Due to by (J2), where denotes the Lebesgue measure of a set, is bounded in a neighborhood of any with . Hence, it follows from [46, Theorem 19] that
To complete the proof, it is left to show that we have for any that
We first show the inclusion “”: In case the set on the righthand side above is empty, then the inclusion holds trivially; otherwise take with and . By density [24, Theorem 4.3] there exists a sequence in converging to in for which we can also assume that in since . Then, by the GaussGreen theorem for , see (2.4), we get
Hence with some , and thus and . To show the reverse inclusion “”, again assuming the set on the lefthand side to be nonempty, we take , for which we want to show that and . By definition of the mapping is a continuous linear functional on , hence there exists such that
for all . Hence, . Further, we also get that , with being the normal trace operator of Proposition 2.4, and hence , which completes the proof. ∎
Using positive 1homogeneity, Proposition 3.2 immediately implies the following refinement.
Corollary 3.5.
3.1. Explicit representation of
Now we study and aim at providing an explicit representation. Note that since is convex, is equal to the lower semicontinuous envelope of with respect to both the strong and weak convergence. In other words, for every we have
(3.3) 
where “” refers to convergence with respect to the strong topology and “” with respect to the weak topology.
In order to obtain an explicit representation of , we will employ the representation of the lower semicontinuous envelope of , denoted here by , as derived in [2]. Note that for
(3.4)  
(3.5) 
with the last equality again being true due to convexity of .
We show that, under suitable coercivity assumptions on , the functionals and coincide.
Lemma 3.6.
Assume and that for some we have for every and for almost every . Then, for all .
Proof.
Since we assume to be bounded, convergence implies convergence. Also, for and, consequently, . Hence we are left to show for all . To this aim, we first note that, due to the coercivity assumption on , for all .
Now take with , in . Without loss of generality, we can assume . From the continuous embedding of into , we get for a generic constant and for some
Hence, also converges weakly (up to subsequences) in , and thus also weakly in . By uniqueness of the weak limit we get with in and the proof is complete. ∎
Remark 3.7.
Note that if the coercivity assumption of Lemma 3.6 holds, then due to the lower semicontinuity of total variation with respect to weak convergence we get if and only if .
We then get the following result, which is a direct consequence of [2, Theorem 1.1].
Proposition 3.8 (Integral representation of ).
Assume that and one of the following two assertions holds true:

with and being a convex function such that (J1)–(J3) hold for .

There exists a constant such that for every and for almost every , and also .
Then, for any we have
(3.6) 
Proof.
Denote by the righthand side of equation (3.6). If one of the two assumptions is satisfied, then it follows from [2, Theorem 1.1] that for every . It hence remains to show that for any . In case is satisfied, this is the assertion of Lemma 3.6. Assume now that holds true. In that case we will show directly that for every . It follows from [2, Theorem 3.1] that is lower semicontinuous with respect to convergence. Consequently it is also lower semicontinuous with respect to weak convergence and hence, for all and in with in we get
This means that is a lower bound for the set
and since, by definition of the lower semicontinuous relaxation, is the infimum of this set, we get
Furthermore, in the proof of [2, Theorem 4.1] it is shown, with being a standard mollification of and any with , that
where, for any open, if and otherwise. Denoting, for fixed , the lower semicontinuous relaxation of in , we get from convergence of to in as (which holds, since by embedding ) that
Now by [12, Theorem 4.1], the function can be extended to a nonnegative Borel measure. We define the uncountable family of open sets by . Then and . By sigma additivity and since we get that, for any , the set
is finite. Hence, the set is at most countable and we can extract a sequence such that , is monotonically decreasing and . Hence we conclude (by for the leftmost inequality below)
and the proof is complete. ∎
Remark 3.9.
In the case with and , the assumptions of the above proposition are fulfilled. Using that, in this case , we get that