A Convergent Overlapping Domain Decomposition Method for Total Variation Minimization

A Convergent Overlapping Domain Decomposition Method for Total Variation Minimization

Massimo Fornasier Johann Radon Institute for Computational and Applied Mathematics (RICAM), Austrian Academy of Sciences, Altenbergerstrasse 69, A-4040, Linz, Austria Email: massimo.fornasier@oeaw.ac.at    Andreas Langer Johann Radon Institute for Computational and Applied Mathematics (RICAM), Austrian Academy of Sciences, Altenbergerstrasse 69, A-4040, Linz, Austria Email: andreas.langer@oeaw.ac.at    Carola-Bibiane Schönlieb Department of Applied Mathematics and Theoretical Physics (DAMTP), Centre for Mathematical Sciences, Wilberforce Road, Cambridge CB3 0WA, United Kingdom.Email: c.b.s.schonlieb@damtp.cam.ac.uk

This paper is concerned with the analysis of convergent sequential and parallel overlapping domain decomposition methods for the minimization of functionals formed by a discrepancy term with respect to data and a total variation constraint. To our knowledge, this is the first successful attempt of addressing such strategy for the nonlinear, nonadditive, and nonsmooth problem of total variation minimization. We provide several numerical experiments, showing the successful application of the algorithm for the restoration of 1D signals and 2D images in interpolation/inpainting problems respectively, and in a compressed sensing problem, for recovering piecewise constant medical-type images from partial Fourier ensembles.

Key words: Domain decomposition method, nonsmooth convex optimization, parallel computation, discontinuous solutions, total variation minimization, -minimization, image and signal processing

AMS subject classifications. 65K10, 65N55 65N21, 65Y05 90C25, 52A41, 49M30, 49M27, 68U10

1 Introduction

In concrete applications, e.g., for image processing, one might be interested to recover at best a digital image provided only partial linear or nonlinear measurements, possibly corrupted by noise. Given the observation that natural and man-made images can be characterized by a relatively small number of edges and extensive relatively uniform parts, one may want to help the reconstruction by imposing that the interesting solution is the one which matches the given data and has also a few discontinuities localized on sets of lower dimension.

In the context of compressed sensing [6, 7, 8, 21], it has been clarified that the minimization of -norms occupies a fundamental role for the promotion of sparse solutions. This understanding furnishes an important interpretation of total variation minimization, i.e., the minimization of the -norm of derivatives [34], as a regularization technique for image restoration. The problem can be modelled as follows; let , for be a bounded open set with Lipschitz boundary, and . For

is the variation of . Further, , the space of bounded variation functions [1, 24], if and only if . In this case, we denote . If (the Sobolev space of -functions with -distributional derivatives), then . We consider as in [12, 38] the minimization in of the functional


where is a bounded linear operator, is a datum, and is a fixed regularization parameter [23]. Several numerical strategies to perform efficiently total variation minimization have been proposed in the literature. Without claiming of being exhaustive, we list a few of the relevant methods, ordered by their chronological appearance:

(i) the linearization approach of Vogel et al. [20] and of Chambolle and Lions [12] by iteratively re-weighted least squares, see also [18] for generalizations and refinements in the context of compressed sensing;

(ii) the primal-dual approach of Chan et al. [13];

(iii) variational approximation via locally quadratic functionals as in the work of Vese et al. [2, 38];

(iv) iterative thresholding algorithms based on projections onto convex sets as in the work of Chambolle [10] as well as in the work of Combettes and Wajs [15] and Daubechies et al. [19];

(v) iterative minimization of the Bregman distance as in the work of Osher et al. [33] (also notice the very recent Bregman split approach [27]);

(vi) graph cuts [11, 16] for the minimization of (1) with and an anisotropic total variation;

(vii) the approach proposed by Nesterov [31] and its modifications by Weiss et al. [39].

These approaches differ significantly, and they provide a convincing view of the interest this problem has been able to generate and of his applicative impact. However, because of their iterative-sequential formulation, none of the mentioned methods is able to address in real-time, or at least in an acceptable computational time, extremely large problems, such as 4D imaging (spatial plus temporal dimensions) from functional magnetic-resonance in nuclear medical imaging, astronomical imaging or global terrestrial seismic tomography. For such large scale simulations we need to address methods which allow us to reduce the problem to a finite sequence of sub-problems of a more manageable size, perhaps computable by one of the methods listed above. With this aim we introduced subspace correction and domain decomposition methods both for -norm and total variation minimizations [25, 26, 35]. We address the interested reader to the broad literature included in [26] for an introduction to domain decompositions methods both for PDEs and convex minimization.

1.1 Difficulty of the problem

Due to the nonsmoothness and nonadditivity of the total variation with respect to a nonoverlapping domain decomposition (note that the total variation of a function on the whole domain equals the sum of the total variations on the subdomains plus the size of the jumps at the interfaces [26, formula (3.4)]), one encounters additional difficulties in showing convergence of such decomposition strategies to global minimizers. In particular, we stress very clearly that well-known approaches as in [9, 14, 36, 37] are not directly applicable to this problem, because either they do address additive problems or smooth convex minimizations, which is not the case of total variation minimization. Moreover the interesting solutions may be discontinuous, e.g., along curves in 2D. These discontinuities may cross the interfaces of the domain decomposition patches. Hence, the crucial difficulty is the correct numerical treatment of interfaces, with the preservation of crossing discontinuities and the correct matching where the solution is continuous instead, see [26, Section 7.1.1].

The work [26] was particularly addressed to nonoverlapping domain decompositions and . Associated to the decomposition define , for ; note that . With this splitting we wanted to minimize by suitable instances of the following alternating algorithm: Pick an initial , for example , and iterate

In [26] we proposed an implementation of this algorithm which is guaranteed to converge and to decrease the objective energy monotonically. We could prove its convergence to minimizers of only under technical conditions on the interfaces of the subdomains. However, in our numerical experiments, the algorithm seems always converging robustly to the expected minimizer. This discrepancy between theoretical analysis and numerical evidences motivated our investigation on overlapping domain decompositions. The hope was that the redundancy given by overlapping patches and the avoidance of boundary interfaces could allow for a technically easier theoretical analysis.

1.2 Our approach, results, and technical issues

In this paper we show how to adapt our previous algorithm [26] to the case of an overlapping domain decomposition. The setting of an overlapping domain decomposition eventually provides us with a framework in which we successfully prove its convergence to minimizers of , both in its sequential and parallel forms. Let us stress that to our knowledge this is the first method which addresses a domain decomposition strategy for total variation minimization with a formal theoretical justification of convergence. It is important to mention that there are other very recent attempts of addressing domain decomposition methods for total variation minimization with successful numerical results [30].

Our analysis is performed for a discrete approximation of the continuous functional (1), for ease again denoted in (3). Essentially we approximate functions by their sampling on a regular grid and their gradient by finite differences . It is well-known that such a discrete approximation -converges to the continuous functional (see [4]). In particular, discrete minimizers of (3), interpolated by piecewise linear functions, converge in weak--topology of to minimizers of the functional (1) in the continuous setting. Of course, when dealing with numerical solutions, only the discrete approach matters together with its approximation properties to the continuous problem. However, the need of working in the discrete setting is not only practical, it is also topological. In fact bounded sets in are (only) weakly--compact, and this property is fundamental for showing that certain sequences have converging subsequences. Unfortunately, the weak--topology of is “too weak” for our purpose of proving convergence of the domain decomposition algorithm; for instance, the trace on boundary sets is not a continuous operator with respect to this topology. This difficulty can be avoided, for instance, by -approximating the functional (1) by means of quadratic functionals (as in [2, 12, 38]) and working with the topology of , the Sobolev space of -functions with -distributional first derivatives. However, this strategy changes the singular nature of the problem which makes it both interesting and difficult. Hence, the discrete approach has the virtues of being practical for numerical implementations, of correctly approximating the continuous setting, and of retaining the major features which makes the problem interesting. Note further that in the discrete setting where topological issues are not a concern anymore, also the dimension can be arbitrary, contrary to the continuous setting where the dimension has to be linked to boundedness properties of the operator , see [38, property H2, pag. 134]. For ease of presentation, and in order to avoid unnecessary technicalities, we limit our analysis to splitting the problem into two subdomains and . This is by no means a restriction. The generalization to multiple domains comes quite natural in our specific setting, see also [26, Remark 5.3]. When dealing with discrete subdomains , for technical reasons, we will require a certain splitting property for the total variation, i.e.,


where and are suitable functions which depend only on the restrictions and respectively, see (9) (symbols and notations are clarified once for all in the following section). Note that this formula is the discrete analogous of [26, formula (3.4)] in the continuous setting. The simplest examples of discrete domains with such a property are discrete -dimensional rectangles (-orthotopes). Hence, for ease of presentation, we will assume to work with -orthotope domains, also noting that such decompositions are already sufficient for any practical use in image processing, and stressing that the results can be generalized also to subdomains with different shapes as long as (2) is satisfied.

1.3 Organization of the work

The paper is organized as follows. In Section 2 we collect the relevant notations and symbols for the paper. Section 3 introduces the problem and the overlapping domain decomposition algorithm which we want to analyze. In Section 4 we address the approximate solution of the local problems defined on the subdomains and we show how we interface them, by means of a suitable Lagrange multiplier. Section 5 and Section 6 are concerned with the convergence of the sequential and parallel forms of the algorithm. In particular, in Section 5 we provide a characterization of minimizers by a discrete representation of the subdifferential of . This characterization is used in the convergence proofs in order to check the reached optimality. The final Section 7 provides a collection of applications and numerical examples.

2 Notations

Let us fix the main notations. Since we are interested in a discrete setting we define the discrete -orthotope , and the considered function spaces are , where for . For we write with


where and . Then we endow with the norm

We define the scalar product of as

and the scalar product of as

with for every and . We will consider also other norms, in particular


We denote the discrete gradient by


for all and for all .
Let , we define for

where . In particular we define the total variation of by setting and , i.e.,

For an operator we denote its adjoint. Further we introduce the discrete divergence defined, in analogy with the continuous setting, by ( is the adjoint of the gradient ). The discrete divergence operator is explicitly given by

for every and for all . (Note that if we considered discrete domains which are not discrete -orthotopes, then the definitions of gradient and divergence operators should be adjusted accordingly.) With these notations, we define the closed convex set

where , and denote the orthogonal projection onto . We will often use the symbol to indicate the constant vector with entry values and to indicate the characteristic function of the domain .

3 The Overlapping Domain Decomposition Algorithm

We are interested in the minimization of the functional


where is a linear operator, is a datum, and is a fixed constant. In order to guarantee the existence of minimizers for (3) we assume that:

  • is coercive in , i.e., there exists a constant such that is bounded in .

It is well known that if then condition (C) is satisfied, see [38, Proposition 3.1].

Now, instead of minimizing (3) on the whole domain we decompose into two overlapping subdomains and such that , , and (2) is fulfilled. For consistency of the definitions of gradient and divergence, we assume that also the subdomains are discrete -orthotopes as well as , stressing that this is by no means a restriction, but only for ease of presentation. Due to this domain decomposition is split into two closed subspaces , for . Note that is not a direct sum of and , but just a linear sum of subspaces. Thus any has a nonunique representation


We denote by the interface between and and by the interface between and (the interfaces are naturally defined in the discrete setting).

We introduce the trace operator of the restriction to a boundary

with , the restriction of on . Note that is as usual the set of maps from to . The trace operator is clearly a linear and continuous operator. We additionally fix a bounded uniform partition of unity (BUPU) such that

  • for ,

  • ,

  • for ,

We would like to solve

by picking an initial , e.g., , and iterate


Note that we are minimizing over functions for which vanish on the interior boundaries, i.e., . Moreover is the sum of the local minimizers and , which are not uniquely determined on the overlapping part. Therefore we introduced a suitable correction by and in order to force the subminimizing sequences and to keep uniformly bounded. This issue will be explained in detail below, see Lemma 5.5. From the definition of , , it is clear that

Note that in general and . In (5) we use ”” (the approximation symbol) because in practice we never perform the exact minimization. In the following section we discuss how to realize the approximation to the individual subspace minimizations.

4 Local Minimization by Lagrange Multipliers

Let us consider, for example, the subspace minimization on


First of all, observe that , hence the former set is also bounded by assumption (C) and the minimization problem (6) has solutions.

It is useful to us to introduce an auxiliary functional of , called the surrogate functional of (cf. [26]): Assume , , and define


A straightforward computation shows that

where is a function of only. Note that now the variable is not anymore effected by the action of . Consequently, we want to realize an approximate solution to (6) by using the following algorithm: For ,


Additionally in (8) we can restrict the total variation on only, since we have


where we used (2) and the assumption that vanishes on the interior boundary . Hence (8) is equivalent to

where . Similarly the same arguments work for the second subproblem.

Before proving the convergence of this algorithm, we need to clarify first how to practically compute for given. To this end we need to introduce further notions and to recall some useful results.

4.1 Generalized Lagrange multipliers for nonsmooth objective functions

Let us begin this subsection with the notion of a subdifferential in finite dimensions.

Definition 4.1.

For a finite locally convex space and for a convex function , we define the subdifferential of at , as the set valued function

It is obvious from this definition that if and only if is a minimizer of . Since we deal with several spaces, namely, , it will turn out to be useful to sometimes distinguish in which space the subdifferential is defined by imposing a subscript for the subdifferential considered on the space .

We consider the following problem


where is a linear operator on . We have the following useful result.

Theorem 4.2.

[28, Theorem 2.1.4, p. 305] Let . Then, solves the constrained minimization problem (10) if and only if

4.2 Oblique thresholding

We want to exploit Theorem 4.2 in order to produce an algorithmic solution to each iteration step (8), which practically stems from the solution of a problem of this type

It is well-known how to solve this problem if in and the trace condition is not imposed. For the general case we propose the following solution strategy. In what follows all the involved quantities are restricted to , e.g., .

Theorem 4.3 (Oblique thresholding).

For and for the following statements are equivalent:

  • ;

  • there exists such that ;

  • there exists with such that and ;

  • there exists with such that or equivalently .

We call the solution operation provided by this theorem an oblique thresholding, in analogy to the terminology in [17], because it performs a thresholding of the derivatives, i.e., it sets to zero most of the derivatives of on , provided which is a fixed vector in .


Let us show the equivalence between (i) and (ii). The problem in (i) can be reformulated as


Recall that is a surjective map with closed range. This means that is injective and that is closed. Using Theorem 4.2 the optimality of is equivalent to the existence of such that


Due to the continuity of in , we have, by [22, Proposition 5.6], that


Thus, the optimality of is equivalent to


This concludes the equivalence of (i) and (ii). Let us show now that (iii) is equivalent to (ii). The condition in (iii) can be rewritten as

Since is 1-homogeneous and lower-semicontinuous, by [26, Example 4.2.2], the latter is equivalent to

and equivalent to (ii). Note that in particular we have , which is easily shown by a direct computation from the definition of subdifferential. We prove now the equivalence between (iii) and (iv). We have

By applying to both sides of the latter equality we get

By observing that , we obtain the fixed point equation


Conversely, since all the considered quantities in

are in , the whole expression is an element in and hence as defined in (iii) is an element in and . This shows the equivalence between (iii) and (iv) and therewith finishes the proof. ∎

We wonder now whether any of the conditions in Theorem 4.3 is indeed practically satisfied. In particular, we want to show that as in (iii) or (iv) of the previous theorem is provided as the limit of the following iterative algorithm:

Proposition 4.4.

The following statements are equivalent:

  • there exists such that (which is in turn the condition (iv) of Theorem 4.3)

  • the iteration (16) converges to any that satisfies (15).

For the proof of this Proposition we need to recall some well-known notions and results.

Definition 4.5.

A nonexpansive map is strongly nonexpansive if for bounded and we have

Proposition 4.6 (Corollaries 1.3, 1.4, and 1.5 [5]).

Let be a strongly nonexpansive map. Then if and only if converges to a fixed point for any choice of .


(Proposition 4.4) Projections onto convex sets are strongly nonexpansive [3, Corollary 4.2.3]. Moreover, the composition of strongly nonexpansive maps is strongly nonexpansive [5, Lemma 2.1]. By an application of Proposition 4.6 we immediately have the result, since any map of the type is strongly nonexpansive whenever is (this is a simple observation from the definition of strongly nonexpansive maps). Indeed, we are looking for fixed points of or, equivalently, of , where . ∎

4.3 Convergence of the subspace minimization

From the results of the previous section it follows that the iteration (8) can be explicitly computed by


where and is any solution of the fixed point equation

The computation of can be implemented by the algorithm (16).

Proposition 4.7.

Assume and . Then the iteration (17) converges to a solution of (6) for any initial choice of .

The proof of this proposition is standard, see [15, 17, 26].

Let us conclude this section mentioning that all the results presented here hold symmetrically for the minimization on , and that the notations should be just adjusted accordingly.

5 Convergence of the Sequential Alternating Subspace Minimization

In this section we want to prove the convergence of the algorithm (5) to minimizers of . In order to do that, we need a characterization of solutions of the minimization problem (3) as the one provided in [38, Proposition 4.1] for the continuous setting. We specify the arguments in [38, Proposition 4.1] for our discrete setting and we highlight the significant differences with respect to the continuous one.

5.1 Characterization of Solutions

We make the following assumptions:

  • is a convex function, nondecreasing in such that

    • .

    • There exist and such that for all .

The particular example we have in mind is simply , but we keep a more general notation for uniformity with respect to the continuous version in [38, Proposition 4.1]. In this section we are concerned with the following more general minimization problem


where is a datum, is a fixed constant (in particular for ).

To characterize the solution of the minimization problem (18) we use duality results from [22]. Therefore we recall the definition of the conjugate (or Legendre transform) of a function (for example see [22, Def. 4.1, pag. 17]):

Definition 5.1.

Let and be two vector spaces placed in the duality by a bilinear pairing denoted by and be a convex function. The conjugate function (or Legendre transform) is defined by

Proposition 5.2.

Let . If the assumption is fulfilled, then if and only if there exists