1 Introduction
###### Abstract.

In this work, we introduce a function space setting for a wide class of structural/weighted total variation (TV) regularization methods motivated by their applications in inverse problems. In particular, we consider a regularizer that is the appropriate lower semi-continuous envelope (relaxation) of a suitable total variation type functional initially defined for sufficiently smooth functions. We study examples where this relaxation can be expressed explicitly, and we also provide refinements for weighted total variation for a wide range of weights. Since an integral characterization of the relaxation in function space is, in general, not always available, we show that, for a rather general linear inverse problems setting, instead of the classical Tikhonov regularization problem, one can equivalently solve a saddle-point problem where no a priori knowledge of an explicit formulation of the structural TV functional is needed. In particular, motivated by concrete applications, we deduce corresponding results for linear inverse problems with norm and Poisson log-likelihood data discrepancy terms. Finally, we provide proof-of-concept numerical examples where we solve the saddle-point problem for weighted TV denoising as well as for MR guided PET image reconstruction.

A function space framework for structural total variation regularization with applications in inverse problems

Michael Hintermüller, Martin Holler and Kostas Papafitsoros Weierstrass Institute for Applied Analysis and Stochastics (WIAS), Mohrenstrasse 39, 10117, Berlin, Germany Institute for Mathematics, Humboldt University of Berlin, Unter den Linden 6, 10099, Berlin, Germany Institute for Mathematics and Scientific Computing, University of Graz, Heinrichstrasse 36, A-8010, Graz, Austria. The institute is a member of NAWI Graz (www.nawigraz.at). M. H. is a member of BioTechMed Graz (www.biotechmed.at).

## 1. Introduction

In many classical applications of inverse problems, rather than measuring data from a single channel, recently the simultaneous acquisition from multiple channels has gained importance. Besides having different sources of information available, a main advantage of such multiple measurements is due to the possibility of exploiting correlations between different data channels in the inversion process, often leading to a significant improvement for each individual channel. In particular, when the underlying quantities of interest can be visualized as image data, these correlations typically correspond to joint structures in images. In view of this background, multimodality and multicontrast imaging explore such joint structures for improved reconstruction. Successful applications of these techniques can be found for instance in biomedical imaging [9, 20, 21, 22, 40, 43, 45, 49, 52], geosciences [51], electron microscopy [27] and several more.

In this context, one distinguishes two different approaches for exploiting correlations: (i) Joint reconstruction techniques that treat all available channels equally, such as [40], and (ii) structural-prior-based regularization techniques that assume some ground truth structural information to be available. Here we focus on a particular class of type (ii), namely structural total-variation-type regularization functionals, i.e., functionals which integrate a spatially-dependent pointwise function of the image gradient for regularization. In this vein, given some prior information , we consider a regularization functional of the type

 J(u)=∫Ωjv(x,∇u(x))dx,

which is used in a Tikhonov-type regularization problem where is reconstructed by solving

 (1.1) minu∫Ωjv(x,∇u(x))dx+(λD∘K)(u).

Here, denotes a data discrepancy term and a bounded linear operator which reflects the forward model. A very basic, formal example of such a regularization is given when denotes the underlying ground truth image and we incorporate information on its gradient by defining , where is some norm. This yields

 J(u)=∫Ω1|∇v(x)||∇u(x)|dx,

which corresponds to a weighted total variation (TV) functional.

Even though problems of the form (1.1) limit the exchange of information to the gradient level, they cover a large set of existing works in particular in the context of medical image processing. This can be explained on the one hand by the popularity of TV-type regularization for imaging in general, which is due to its capability of recovering jump discontinuities. On the other hand, the gradient level seems very well suited for exploiting correlations between different contrasts or modalities as it primarily encodes structural information, i.e., to some extent it is independent of the absolute magnitude of the signal. With the goal of enforcing parallel level sets, in [19], for instance, the regularizer was chosen for a given image as

 (1.2) J(u)=∫Ωϕ(ψ(|∇v|β|∇v|β)−ψ(|(∇u⋅∇v)|β2))dx,

where are appropriate increasing functions, denote smoothed norms, and is the scalar (“dot”) product of vectors . This definition is motivated by the fact that for two vectors the quantity is zero if and only if they are parallel to each other. Particular instances of (1.2) were then also used in [22] for joint MR-PET reconstruction. A similar functional, but in the spirit of [39] is

 (1.3) J(u)=∫Ω|∇u|2−(∇u⋅w)2dx.

Here denotes some a priori vector field that contains gradient information. In the context of multicontrast MR, the authors in [20] choose

 (1.4) J(u)=∫Ω∣∣ ∣∣(I−∇v⊗∇v|∇v|2)∇u∣∣ ∣∣dx=∫Ω|∇u|sinθdx.

where again a smoothed version of the absolute value was used in the denominator. Here, denotes the angle between and . Observe that the latter functional can also be written as

 ∫Ω⎛⎝|∇u|2−(∇u⋅∇v|∇v|)2⎞⎠1/2,

bearing similarity to the functional (1.3).

We note that the regularization approaches above have been considered only in discrete settings, despite the original continuous formulations. Concerning the latter and as indicated above in connection with (1.1), it is natural to consider , i.e., is Lebesgue integrable and is of bounded total variation, still allowing for jump discontinuities, i.e., sharp edges. As a consequence, the (generalized) gradient of is in general a finite Radon measure, only. This fact, however, challenges the proper defininition of regularization functionals of the above types in an associated function space setting. The first part of our work aims at addressing precisely this issue. In fact, resorting to the concept of functions of a measure [7] we propose to use the convex biconjugate as a regularizer. Indeed, starting from a general function , with minimal assumptions, e.g., convexity, linear growth, -homogeneity in the second variable, we define the functional as

 (1.5) J(u)={∫Ωj(x,∇u(x))dxif u∈W1,1(Ω),∞else,

where we omit the dependence of on the prior for the sake of unburdening notation. Here, and denote the usual Lebesgue and Sobolev spaces; see [1]. We note that in (1.5) is not suitable for variational regularization as it is finite only for a class of rather regular functions (i.e., in ) and it is not lower semi-continuous in an appropriate topology. As a remedy, we propose to resort to the convex biconjugate of , which we call structural TV functional. We recall that coincides with the lower semi-continuous envelope (relaxation) of with respect to convergence. In general, may not have an explicit representation, but we show that it can always be expressed in a dual form as

 (1.6) J∗∗(u)=supg∈Q∫Ωudivgdx,for % u∈Lp(Ω),

where

 (1.7) Q:={g∈Wq0(div;Ω)∩L∞(Ω,Rd):j∘(x,g(x))≤1 for almost every (a.e.) x∈Ω},

see (3.2) below for the precise definition of the support function , and we refer to [25] for details on . Nevertheless, based on a recent result by Amar, De Cicco and Fusco [2], by linking to the relaxed functional of with respect to convergence we are able to provide an integral representation, under additional assumptions. For instance, in the case where , , with , is equal to the weighted TV-functional

 J∗∗(u)=∫Ωα−d|Du|,u∈BV(Ω),

with the approximate lower limit of ; see (2.3) below for its definition. The set then reads

 Q={g∈Wq0(div;Ω)∩L∞(Ω,Rd):|g(x)|≤α(x) for a.e. x∈Ω}.

Interestingly, as a consequence of this formulation, we get certain density results of convex intersections in the spirit of [31, 32, 34] as a byproduct; compare Section 4.

Taking advantage of duality theory, in the second part of the paper we use the structural TV functional for the regularization of linear inverse problems. In particular we study the general minimization problem

 (1.8) infu∈Lp(Ω)J∗∗(u)+(λD∘K)(u).

As emphasized above, an explicit representation of the functional is available only under some additional, perhaps restrictive assumptions. In order to solve (1.8) without invoking such assumptions, we employ Fenchel-Rockafellar duality and show, in the continuous setting, equivalence of (1.8) to a saddle-point problem of the form

 (1.9) infp∈Wq0(div;Ω)p∈Qsupu∈Lp(Ω)(divp,u)−(λD∘K)(u),

where denotes an appropriate pairing. This is achieved under assumptions which are tight in the sense that we can provide a counterexample where, without these assumptions, even existence of a solution for (1.8) fails. The major advantage of the above saddle-point reformulation is that it allows to obtain a solution of the original problem without requiring an explicit form of neither nor . Note that the latter would be required for solving the predual problem. Furthermore, it has a format directly amenable to duality-based numerical optimization algorithms.

The equivalence to a saddle-point reformulation is obtained under rather general assumptions on the data discrepancy term , which, as corollary, allows us to cover the case of any norm discrepancy term as well as the case of a log-likelihood term for Poisson noise, which is relevant for instance for PET image reconstruction and electron tomography. The latter leads to the minimization problem

 (1.10) minu∈Lp(Ω)J∗∗(u)+λ∫ΣKu−flog(Ku+c0)dσ+I[0,∞)(u).

where denotes a Radon-transform-type operator, some given data, and is the indicator function for a set , i.e., if and otherwise.

Finally, we show the versatility of our approach with proof-of-concept numerical examples in weighted TV denoising with vanishing weight function and MR guided PET reconstruction.

We note here that there is previous work on the analysis of weighted and/or structural TV regularization in an infinite dimension setting. In [26], another instance of structural-TV type functionals is employed, but the work only considers the case of image denoising. Further, the authors simultaneously optimize over the image data and an anisotropy in the TV-term, which leads to a non-convex problem. Regarding properties of solutions of weighted TV denoising we refer to the work by Jalalzai [38] as well as to [30]. Finally, we mention that in [5] the authors analyze a weighted TV regularization model for vortex density models.

### Structure of the paper

The paper is organized as follows: In Section 2, we fix our notation and we remind the reader of basic facts concerning functions of bounded variation and spaces.

In Section 3 we describe the relaxation framework for the structural TV functional. Under some conditions, we provide an integral representation of the relaxation based on a result of [2] and we also provide a characterization of its subdifferential.

Some refinements for weighted TV functional are given in Section 4 in the case of continuous and lower semi-continuous weight functions. We show that the functional can be defined in a dual fashion, using smooth test functions and as a byproduct we obtain certain density results of convex intersections in the spirit of [32, 34].

Section 5 contains the main duality result of the paper. In particular, we show that under certain mild assumptions, the variational regularization problem with the relaxed structural TV functional as regularizer can be equivalently formulated as a saddle-point problem which requires knowledge neither of the explicit form of the relaxation–as it is the case for the primal problem –nor of the convex conjugate of the discrepancy term–as it is the case for the predual problem. As particular application, we elaborate on the case of Poisson log-likelihood discrepancy terms and show how the result can be transferred to this situation.

Finally, in Section 6 we provide proof-of-concept numerical examples, where we solve our saddle point problem for weighted TV denoising with vanishing weight function and for MR-guided PET image reconstruction.

## 2. Notation and Preliminaries

### 2.1. Functions of bounded variation

Throughout the paper, , with , will be a bounded, open set with Lipschitz boundary, and we denote . By we always denote two real numbers such that and if , if , and if .

The space of functions of bounded variation on is denoted by . We have that if and only if it is in and its distributional derivative is a bounded Radon measure, denoted by . The total variation of is defined to be the total variation of that measure, i.e., and it is equal to

 (2.1) |Du|(Ω)=sup{∫Ωudivϕdx:ϕ∈C∞c(Ω,Rd),|ϕ(x)|≤1,∀x∈Ω}.

The measure can be decomposed into

 Du=Dau+Dju+Dcu,

where is the absolutely continuous with respect to Lebesgue measure , with density function denoted by , denotes the jump part which is the restriction to the jump set of , and is the Cantor part of . We recall that is defined over the set of points in for which that where

 (2.2) u+(x) =inf{t∈[−∞,∞]:limr→0Ld({v>t}∩B(x,r))rd=0}, (2.3) u−(x) =sup{t∈[−∞,∞]:limr→0Ld({v

are the approximate upper and lower limits of , respectively. With these definitions, the total variation of the measure can be written as

 |Dju|(Ω)=∫Ju|u+(x)−u−(x)|dHd−1,

where denotes the -dimensional Hausdorff measure. The density functions of the measures and with respect to and are denoted by and , respectively. For further details about the space , we refer the reader to [3, 24].

### 2.2. The space Wq(div;Ω)

As the Banach space , with , plays a major role in our work, we recall here some basic facts.

###### Definition 2.1 (Wq(div;Ω)).

Let and . We have if there exists such that for all

 ∫Ω∇ϕ⋅gdx=−∫Ωϕwdx.

Furthermore we define

 Wq(div;Ω):={g∈Lq(Ω,Rd):divg∈Lq(Ω)},

with the norm

###### Remark 2.2.

By density of in we have as is unique. By completeness of and it follows that is a Banach space when equipped with .

We now state some general properties of . As is just a straightforward generalization of the well-known space , these results can be proven readily by generalizing from ; see [25, Chapter 1] for details on the latter.

###### Proposition 2.3 (Density).

Let . Then is dense in with respect to .

###### Proposition 2.4 (Normal trace).

Let and denote by the outer normal vector to at . Then the mapping

 τ:C∞(¯¯¯¯Ω,Rd)→(W1−1p,p(∂Ω))∗,g↦τ(g),

with for , can be extended to a linear, continuous mapping, also denoted by . Further we have a Gauss-Green formula for functions:

 ∫Ω∇v⋅gdx+∫Ωvdivgdx=τ(g)(v)for all v∈W1,p(Ω),g∈Wq(div;Ω).
###### Definition 2.5.

For , we define

 Wq0(div;Ω)=¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯C∞c(Ω,Rd)∥⋅∥Wq(div;Ω).

The next proposition also uses ideas from [25] but since its proof is slightly more involved, we include it in Appendix A.

###### Proposition 2.6.

For we have , with the -trace operator as in Proposition 2.4 and is understood in the sense of .

From this fact, another equivalent characterization of can be obtained readily.

###### Corollary 2.7.

For and we have if and only if

 (2.4) ∫Ω∇v⋅gdx=−∫Ωvdivgdx,%forallv∈W1,p(Ω).

## 3. Structural TV as a lower semi-continuous envelope

The goal of this section is to obtain a predual representation of a general TV-based functional that includes the case of weighted TV for a general choice of weights. To this aim, we will define the corresponding functional as the -lower semi-continuous envelope of a restriction to functions, with . The approach is motivated by the paper of Bouchitte and Dal Maso [12]. We start with a few definitions.

By we always denote a function satisfying the following conditions:

• For a.e. , is convex and positively 1-homogeneous on .

• There exists such that

 0≤j(x,z)≤γ(1+|z|)for a.e. x∈Ω and every z∈Rd.
• For every , , i.e., is an even function in the second variable.

Furthermore, for we define as

 (3.1) J(u):={∫Ωj(x,∇u(x))dxif u∈W1,1(Ω),∞else.
###### Remark 3.1.

We note that with any bounded above satisfies the above assumptions (J1)–(J3).

In what follows, convex conjugation of will always be understood with respect to the second argument, and convex conjugation of is performed in . Due to positive 1-homogeneity, we get the following well known representation of . For its formulation, we define the support function

 (3.2) j∘(x,z∗):=supz:j(x,z)≤1z∗⋅z,

and denote by the convex conjugate of ; see [23] for more information on the latter concept.

###### Proposition 3.2.

Let . Then, for any we have , with the convex indicator function of the set .

###### Proof.

In case the assertion holds true trivially. Hence we assume that there exists with . For such a point , we can write

 j(x,z)=j(x,j(x,z)zj(x,z))=λj(x,~z)

with and for such that . Hence we get

 j∘(x,z∗)=supλ∈[0,1]supj(x,z)=1λ(z∗⋅z)=supj(x,z)=1z∗⋅z≥0,

where the non-negativity follows from the fact that is even. We further have

 j∗(x,z∗) =supz{z∗⋅z−j(x,z)}=supλ≥0supj(x,z)=1{z∗⋅(λz)−λj(x,z)} =supλ≥0supj(x,z)=1λ(z∗⋅z−1)=supλ≥0λ(j∘(x,z∗)−1) ={∞if j∘(x,z∗)>1,0if j∘(x,z∗)≤1}=IAx(z∗),

which completes the proof.

###### Remark 3.3.

From the proposition above if follows for that

 j∘(x,z∗)=⎧⎨⎩|z∗|α(x)if α(x)≠0,I{0}(z∗)else,

and hence .

We note that by definition and density we have for with that

 J∗(u∗) =supu∈Lp(Ω){(u∗,u)−J(u)}=supu∈BV(Ω){(u∗,u)−J(u)}=supu∈W1,1(Ω){(u∗,u)−J(u)}.

Here we write for and . The next proposition follows the lines of [12] and provides a characterization , which also holds without the positive 1-homogeneity assumption on .

###### Proposition 3.4.

Let . Then we have for all that

 J∗(u∗)=ming∈K(u∗)∫Ωj∗(x,g(x))dx,

where , and we set .

###### Proof.

We define the map with . Due to (J2), is well defined. Then, by [23, Theorem X.2.1], we get for that

 F∗(p∗)=∫Ωj∗(x,p∗(x))dx.

Further, we define the unbounded operator by and . Then is densely defined and closed. We note that can be re-written as

 J(u)={F(Λu)if u∈dom(Λ),+∞else.

Due to by (J2), where denotes the Lebesgue measure of a set, is bounded in a neighborhood of any with . Hence, it follows from [46, Theorem 19] that

 J∗(u∗) =min{F∗(y∗):y∗∈dom(Λ∗),Λ∗y∗=u∗} =min{∫Ωj∗(x,y∗(x))dx:y∗∈dom(Λ∗),Λ∗y∗=u∗}.

To complete the proof, it is left to show that we have for any that

 {y∗∈dom(Λ∗):Λ∗y∗=u∗}={g∈Wq0(div;Ω)∩L∞(Ω,Rd):−divg=u∗}.

We first show the inclusion “”: In case the set on the right-hand side above is empty, then the inclusion holds trivially; otherwise take with and . By density [24, Theorem 4.3] there exists a sequence in converging to in for which we can also assume that in since . Then, by the Gauss-Green theorem for , see (2.4), we get

 ∫Ωdivgvdx←∫Ωdivgvndx=−∫Ωg⋅Λvndx→−∫Ωg⋅Λvdx.

Hence with some , and thus and . To show the reverse inclusion “”, again assuming the set on the left-hand side to be non-empty, we take , for which we want to show that and . By definition of the mapping is a continuous linear functional on , hence there exists such that

 ∫Ωwvdx=∫ΩΛ∗y∗vdx=∫Ωy∗⋅∇vdx

for all . Hence, . Further, we also get that , with being the normal trace operator of Proposition 2.4, and hence , which completes the proof. ∎

Using positive 1-homogeneity, Proposition 3.2 immediately implies the following refinement.

###### Corollary 3.5.

Under the assumption of Proposition 3.4 we get that

 J∗(u∗)=Idiv(Q)(u∗)

with

 Q={g∈Wq0(div;Ω)∩L∞(Ω,Rd):j∘(x,g(x))≤1 for a.e. x∈Ω}.

In particular, if then we get

 Q={g∈Wq0(div;Ω)∩L∞(Ω,Rd):|g(x)|≤α(x) for a.e. x∈Ω}.

### 3.1. Explicit representation of J∗∗

Now we study and aim at providing an explicit representation. Note that since is convex, is equal to the lower semi-continuous envelope of with respect to both the strong and weak -convergence. In other words, for every we have

 J∗∗(u) =inf{liminfn→∞J(un):un∈Lp(Ω),un→u in Lp(Ω)} (3.3) =inf{liminfn→∞J(un):un∈Lp(Ω),un⇀u in Lp(Ω)},

where “” refers to convergence with respect to the strong topology and “” with respect to the weak topology.

In order to obtain an explicit representation of , we will employ the representation of the -lower semi-continuous envelope of , denoted here by , as derived in [2]. Note that for

 (3.4) ¯¯¯¯J(u) :=inf{liminfn→∞J(un):un∈W1,1(Ω),un→u in L1(Ω)} (3.5) =inf{liminfn→∞J(un):un∈W1,1(Ω),un⇀u in L1(Ω)},

with the last equality again being true due to convexity of .

We show that, under suitable coercivity assumptions on , the functionals and coincide.

###### Lemma 3.6.

Assume and that for some we have for every and for almost every . Then, for all .

###### Proof.

Since we assume to be bounded, convergence implies convergence. Also, for and, consequently, . Hence we are left to show for all . To this aim, we first note that, due to the coercivity assumption on , for all .

Now take with , in . Without loss of generality, we can assume . From the continuous embedding of into , we get for a generic constant and for some

 ∥un∥Lp(Ω)≤C(∥un∥L1(Ω)+|Dun|(Ω))≤C(∥un∥L1(Ω)+J(un))

Hence, also converges weakly (up to subsequences) in , and thus also weakly in . By uniqueness of the weak limit we get with in and the proof is complete. ∎

###### Remark 3.7.

Note that if the coercivity assumption of Lemma 3.6 holds, then due to the lower semi-continuity of total variation with respect to weak -convergence we get if and only if .

We then get the following result, which is a direct consequence of [2, Theorem 1.1].

###### Proposition 3.8 (Integral representation of J∗∗).

Assume that and one of the following two assertions holds true:

1. with and being a convex function such that (J1)–(J3) hold for .

2. There exists a constant such that for every and for almost every , and also .

Then, for any we have

 (3.6) J∗∗(u)=∫Ωj(x,∇u(x))dx+∫Ωj−(x,σDcu)d|Dcu|+∫Ju∩Ω((u+(x)−u−(x))j−(x,σDju)dHd−1.
###### Proof.

Denote by the right-hand side of equation (3.6). If one of the two assumptions is satisfied, then it follows from [2, Theorem 1.1] that for every . It hence remains to show that for any . In case is satisfied, this is the assertion of Lemma 3.6. Assume now that holds true. In that case we will show directly that for every . It follows from [2, Theorem 3.1] that is lower semi-continuous with respect to convergence. Consequently it is also lower semi-continuous with respect to weak convergence and hence, for all and in with in we get

 J(u)≤liminfn→∞J(un).

This means that is a lower bound for the set

 {liminfn→∞J(un):un∈W1,1(Ω),un⇀u in Lp(Ω)}

and since, by definition of the lower semi-continuous relaxation, is the infimum of this set, we get

 J(u)≤J∗∗(u).

Furthermore, in the proof of [2, Theorem 4.1] it is shown, with being a standard mollification of and any with , that

 liminfh→∞J(uh,A)≤J(u),

where, for any open, if and otherwise. Denoting, for fixed , the lower semi-continuous relaxation of in , we get from convergence of to in as (which holds, since by embedding ) that

 ¯¯¯¯Jd∗(u,A)≤liminfh→∞J(uh,A)≤J(u).

Now by [12, Theorem 4.1], the function can be extended to a non-negative Borel measure. We define the uncountable family of open sets by . Then and . By sigma additivity and since we get that, for any , the set

 {ϵ>0:¯¯¯¯Jd∗(u,∂Ωϵ)>1/n}

is finite. Hence, the set is at most countable and we can extract a sequence such that , is monotonically decreasing and . Hence we conclude (by for the left-most inequality below)

 J∗∗(u)≤¯¯¯¯Jd∗(u,Ω)=limi→∞¯¯¯¯Jd∗(u,Ωϵi)≤J(u)

and the proof is complete. ∎

###### Remark 3.9.

In the case with and , the assumptions of the above proposition are fulfilled. Using that, in this case , we get that

 J∗∗(u) =∫Ωα−|∇u(x)|dx+∫Ωα−|σDcu|d|Dcu|+∫Ju∩Ω((u+−u