Subspace correction methods for total variation and minimization
Abstract
This paper is concerned with the numerical minimization of energy functionals in Hilbert spaces involving convex constraints coinciding with a seminorm for a subspace. The optimization is realized by alternating minimizations of the functional on a sequence of orthogonal subspaces. On each subspace an iterative proximitymap algorithm is implemented via oblique thresholding, which is the main new tool introduced in this work. We provide convergence conditions for the algorithm in order to compute minimizers of the target energy. Analogous results are derived for a parallel variant of the algorithm. Applications are presented in domain decomposition methods for singular elliptic PDE’s arising in total variation minimization and in accelerated sparse recovery algorithms based on minimization. We include numerical examples which show efficient solutions to classical problems in signal and image processing.
omain decomposition method, subspace corrections, convex optimization, parallel computation, discontinuous solutions, total variation minimization, singular elliptic PDE’s, minimization, image and signal processing
65K10, 65N55 65N21, 65Y05 90C25, 52A41, 49M30, 49M27, 68U10
1 Introduction
Let be a real separable Hilbert space. We are interested in the numerical minimization in of the general form of functionals
where is a bounded linear operator, is a datum, and is a fixed constant. The function is a seminorm for a suitable subspace of . In particular, we investigate splittings into arbitrary orthogonal subspaces for which we may have
where is the orthogonal projection onto . With this splitting we want to minimize by suitable instances of the following alternating algorithm: Pick an initial , for example , and iterate
This algorithm is implemented by solving the subspace minimizations via an oblique thresholding iteration. We provide a detailed analysis of the convergence properties of this sequential algorithm and of its modification for parallel computation. We motivate this rather general approach by two relevant applications in domain decomposition methods for total variation minimization and in accelerated sparse recovery algorithms based on minimization. Nevertheless, the applicability of our results reaches far beyond these particular examples.
1.1 Domain decomposition methods for singular elliptic PDE’s
Domain decomposition methods were introduced as techniques for solving partial differential equations based on a decomposition of the spatial domain of the problem into
several subdomains [33, 7, 47, 15, 36, 48, 32, 6, 34]. The initial equation restricted to the subdomains defines a sequence of new local problems. The main goal is to solve the initial equation via the solution of the local problems. This procedure induces a dimension reduction which is the major responsible of the success of such a method.
Indeed, one of the principal motivations is the formulation of solvers which can be easily parallelized.
We apply the theory and the algorithms developed in this paper to adapt domain decompositions to the minimization of functionals with total variation constraints.
Differently from situations classically encountered in domain decomposition methods for nonsingular PDE’s, where solutions are usually supposed at least continuous, in our case the interesting solutions may be discontinuous, e.g., along curves in 2D. These discontinuities may cross the interfaces of the domain decomposition patches. Hence, the crucial difficulty is the correct treatment of interfaces, with the preservation of crossing discontinuities and the correct matching where the solution is continuous instead. We consider the minimization of the functional in the following setting: Let , for , be a bounded open set with Lipschitz boundary. We are interested in the case when , and , the variation of . Then a domain decomposition induces the space splitting into , . Hence, by means of the proposed alternating algorithm, we want to minimize the functional
The minimization of energies with total variation constraints traces back to the first uses of such a functional model in noise removal in digital images as proposed by Rudin, Osher, and Fatemi [39]. There the operator is just the identity. Extensions to more general operators and numerical methods for the minimization of the functional appeared later in several important contributions [14, 22, 3, 45, 13]. From these pioneering and very successful results, the scientific output related to total variation minimization and its applications in signal and image processing increased dramatically in the last decade. It is not worth here to mention all the possible directions and contributions. We limit ourself to mention that, to our knowledge, this paper is the first in presenting a successful domain decomposition approach to total variation minimization. The motivation is that several approaches are directed to the solution of the EulerLagrange equations associated to the functional , which determine a singular elliptic PDE involving the Laplace operator. Due to the fact that is not differentiable, one has to discretize its subdifferential, and its characterization is indeed hard to implement numerically in a correct way. The lack of a simple characterization of the subdifferential of the total variation especially raises significant difficulties in dealing with discontinuous interfaces between patches of a domain decomposition. Our approach overcomes these difficulties by minimizing the functional via an iterative proximitymap algorithm, as proposed, e.g., in [13], instead of attempting the direct solution of the EulerLagrange equations. It is also worth to mention that, due to the generality of our setting, our approach can be extended to more general subspace decompositions, not only those arising from a domain splitting. This can open room to more sophisticated multiscale algorithms where are multilevel spaces, e.g., from a wavelet decomposition.
1.2 Accelerated sparse recovery algorithms based on minimization
In this application, we are concerned with the use of the alternating algorithm to the case where is a countable index set, , and . The minimization of the functional
proved to be an extremely efficient alternative to the wellknown Tikhonov regularization [27], whenever
is an illposed problem and the solution is expected to be a vector with a moderate number of nonzero entries. Indeed, the imposition of the constraint does promote a sparse solution. The use of the norm as a sparsitypromoting functional can be found first in reflection seismology and in deconvolution of seismic traces [17, 40, 42]. In the last decade more understanding of the deep motivations why minimization tends to promote sparse recovery was developed. Rigorous results began to appear in the late1980’s, with Donoho and Stark [25] and Donoho and Logan [24]. Applications for minimization in statistical estimation began in the mid1990’s with the introduction of the LASSO algorithm [43] (iterative thresholding). In the signal processing community, Basis Pursuit [16] was proposed in compression applications for extracting a sparse signal representation from highly overcomplete dictionaries. From these early steps the applications and understanding of minimization have continued to increase dramatically. It is now hard to trace all the relevant results and applications and it is beyond the scope of this paper. We shall address the interested reader to the review papers [4, 10]^{1}^{1}1The reader can also find a sufficiently comprehensive collection of the ongoing recent developments at the website http://www.dsp.ece.rice.edu/cs/.. We may simply emphasize the importance of the study of minimization by saying that, due to the surprisingly effectiveness in several applications, it can be considered today as the “modern successor” of least squares. From this lapidary statement it follows the clear need for efficient algorithms for the minimization of . An iterative thresholding algorithm was proposed for this task [18, 19, 21, 41, 43]. We refer also to the recent developments [30, 31]. Unfortunately, despite its simplicity which makes it very attractive to users, this algorithm does not perform very well. For this reason, together with other acceleration methods, e.g., [20], a “domain decomposition” algorithm was proposed in [29], and we proved its effectiveness in accelerating the convergence and we provided its parallelization. There the domain is the label set which is disjointly decomposed . This decomposition induces an orthogonal splitting of into the subspaces , . In this paper we investigate the application of the alternating algorithm to more general orthogonal subspace decompositions and we discuss how the choice can influence convergence properties and speedup. Again the generality of our approach allows to experiment several possible decompositions, but we limit ourself to present some key numerical examples in specific cases which help to highlight the properties, i.e., virtues and limitations, of the algorithm.
1.3 Content of the paper
In section 2 we illustrate the general assumptions on the convex constraint function and the subspace decompositions. In section 3 we formulate the minimization problem and motivate the use of the alternating subspace correction algorithm. With section 4 we start the construction of the algorithmic approach to the minimization, introducing the novel concept of oblique thresholding, computed via a generalized Lagrange multiplier. In section 5 we investigate convergence properties of the alternating algorithm, presenting sufficient conditions which allow it to converge to minimizers of the target functional . The same results are presented in section 6 for a parallel variant of the algorithm. Section 7 is dedicated to applications and numerical experiments in domain decomposition methods for total variation minimization in 1D and 2D problems, and in accelerations of convergence for minimization.
2 Preliminary Assumptions
We begin this section with a short description of the generic notations used in this paper.
In the following is a real separable Hilbert space endowed with the norm . For some countable index set we denote by , , the space of real sequences with norm
and as usual. If is a sequence of positive weights then we define the weighted spaces with norm
(with the standard modification for ). The Euclidean space is denoted by endowed with the Euclidean norm, but we will also use the dimensional space , i.e., endowed with the norm. By we denote the nonnegative real numbers.
Usually will denote an open bounded set with Lipschitz boundary. The symbol denotes the usual Lebesgue space of summable functions, is the space of functions times continuously differentiable, and the space of functions with bounded variation. For a topological vector space we denote its topological dual. Depending on the context, the symbol may define an equivalence of norms or an isomorphism of spaces or sets. The symbol denotes the characteristic function of the set .
More specific notations will be defined in the paper, where they turn out to be useful.
2.1 The convex constraint function
We are given a function with the following properties:

;

is sublinear, i.e., for all ;

is 1homogeneous, i.e., for all .

is lowersemincontinuous in , i.e., for any converging sequence in
Associated with we assume that there exists a dense subspace for which is a seminorm and endowed with the norm
is a Banach space. We do not assume instead that is reflexive in general; note that due to the dense embedding we have
and the duality extends the scalar product on . In particular, is weaklydense in . In the following we require

bounded subsets in are sequentially bounded in another topology of ;

is lowersemicontinuous with respect to the topology ;
In practice, we will always require also that

.
We list in the following the specific examples we consider in this paper.
Examples \thetheorem
1. Let , for be a bounded open set with Lipschitz boundary, and . We recall that for
is the variation of and that (the space of bounded variation functions, [1, 28]) if and only if , see [1, Proposition 3.6]. In such a case, , where is the total variation of the finite Radon measure , the derivative of in the sense of distributions. Thus, we define and it is immediate to see that must coincide with . Due to the embedding and the Sobolev embedding [1, Theorem 3.47] we have
Hence is indeed a Banach space. It is known that is lowersemincontinuous with respect to [1, Proposition 3.6]. We say that a sequence in converges to with the weaktopology if converges to in and converges to with the weaktopology in the sense of the finite Randon measures. Bounded sets in are sequentially weaklycompact ([1, Proposition 3.13]), and is lowersemicontinuous with respect to the weaktopology.
2. Let be a countable index set and . For a strictly positive sequence , i.e., , we define . The space simply coincides with . Observe that bounded sets in are sequentially weakly compact in and that, by Fatou’s lemma, is lowersemicontinuous with respect to both strong and weak topologies of .
3. Let endowed with the Euclidean norm, and , for , is a fixed linear operator. We define . Clearly and all the requested properties are trivially fulfilled. One particular example of this finite dimensional situation is associated with the choice of given by , . In this case is the discrete variation of the vector and the model can be seen as a discrete approximation to the situation encountered in the first example, by discrete sampling and finite differences, i.e., setting and .
2.2 Bounded subspace decompositions
In the following we will consider orthogonal decompositions of into closed subspaces. We will also require that such a splitting is bounded in .
Assume that are two closed, mutually orthogonal, and complementary subspaces of , i.e., , and are the corresponding orthogonal projections, for . Moreover we require the mapping property
continuously in the norm of , and is closed. This implies that splits into the direct sum .
3 A Convex Variational Problem and Subspace Splitting
We are interested in the minimization in (actually in ) of the functional
where is a bounded linear operator, is a datum, and is a fixed constant. In order to guarantee the existence of its minimizers we assume that:

is coercive in , i.e., is bounded in .
Examples \thetheorem
1. Assume , for be a bounded open set with Lipschitz boundary, and (compare Examples 2.1.1). In this case we deal with total variation minimization. It is wellknown that if then condition (C) is indeed satisfied, see [45, Proposition 3.1] and [14].
2. Let be a countable index set and . For a strictly positive sequence , i.e., , we define (compare with Examples 2.1.2). In this case condition (C) is trivially satisfied since , for .
Under the assumptions above, has minimizers in . {proof} The proof is a standard application of the direct method of calculus of variations. Let , a minimizing sequence. By assumption (C) we have for all . Therefore by (H1) we can extract a subsequence in converging in the topology . Possibly passing to a further subsequence we can assume that it also converges weakly in . By lowersemicontinuity of with respect to the weak topology of and the lowersemicontinuity of with respect to the topology , ensured by assumption (H2), we have the wanted existence of minimizers.
The minimization of is a classical problem [26] which was recently reconsidered by several authors, [13, 18, 19, 21, 41, 43], with emphasis on the computability of minimizers in particular cases. They studied essentially the same algorithm for the minimization.
For with properties , there exists a closed convex set such that
See also Examples 4.1.2 below. In the following we assume furthermore that . For any closed convex set we denote the orthogonal projection onto . For , called the generalized thresholding map in the signal processing literature, the iteration
(1) 
converges weakly to a minimizer of , for any initial choice , provided and are suitably rescaled so that .
For particular situations, e.g., and , one can prove the convergence in norm [19, 21].
As it is pointed out, for example in [20, 29], this algorithm converges with a poor rate, unless is nonsingular or has special additional spectral properties. For this reason accelerations by means of projected steepest descent iterations [20] and domain decomposition methods [29] were proposed.
The particular situation considered in [29] is and . In this case one takes advantage of the fact that for a disjoint partition of the index set we have the splitting for any vector supported on , . Thus, a decomposition into column subspaces (i.e., componentwise) of the operator (if identified with a suitable matrix) is realized, and alternating minimizations on these subspaces are performed by means of iterations of the type (1). This leads, e.g., to the following sequential algorithm: Pick an initial , for example , and iterate
(2) 
Here the operator is the softthresholding operator which acts componentwise and defined by
(3) 
The expected benefit from this approach is twofold:

Instead of solving one large problem with many iteration steps, we can solve approximatively several smaller subproblems, which might lead to an acceleration of convergence and a reduction of the overall computational effort, due to possible conditioning improvements;

The subproblems do not need more sophisticated algorithms, simply reproduce at smaller dimension the original problem, and they can be solved in parallel.
The nice splitting of as a sum of evaluations on subspaces does not occur, for instance, when , , and is a disjoint decomposition of . Indeed, cf. [1, Theorem 3.84], we have
(4) 
Here one should not confuse with any since the former indicates the Hausdorff measure of dimension .
The symbols and denote the left and right approximated limits at jump points [1, Proposition 3.69].
The presence of the additional boundary interface term does not allow to use in a straightforward way iterations as in (1) to minimize the local problems on .
Moreover, also in the sequence space setting mentioned above, the hope for a better conditioning by column subspace splitting as in [29] might be illposed, no such splitting needs to be well conditioned in general (good cases are provided in [44] instead).
Therefore, one may want to consider arbitrary subspace decompositions and, in order to deal with these more general situations, we investigate splittings into arbitrary orthogonal subspaces for which we may have
In principal, in this paper we limit ourself to consider the detailed analysis for two subspaces . Nevertheless, the arguments can be easily generalized to multiple subspaces , see, e.g., [29], and in the numerical experiments we will also test this more general situation.
With this splitting we want to minimize by suitable instances of the following alternating algorithm: Pick an initial , for example , and iterate
(5) 
We use “” (the approximation symbol) because in practice we never perform the exact minimization, as it occurred in (2). In the following section we discuss how to realize the approximation to the individual subspace minimizations. As pointed out above, this cannot just reduce to a simple iteration of the type (1).
4 Local Minimization by Lagrange Multipliers
Let us consider, for example,
(6) 
First of all, observe that , hence the former set is also bounded by assumption (C). By the same argument as in Lemma 3, the minimization (6) has solutions. It is useful to us to introduce an auxiliary functional , called the surrogate functional of : Assume and and define
(7) 
A straightforward computation shows that
where is a function of only. We want to realize an approximate solution to (6) by using the following algorithm: For ,
(8) 
Before proving the convergence of this algorithm, we need to investigate first how to compute practically for given. To this end we need to introduce further notions and to recall some useful results.
4.1 Generalized Lagrange multipliers for nonsmooth objective functions
Let us begin this subsection with the notion of a subdifferential. {definition} For a locally convex space and for a convex function , we define the subdifferential of at , as if , otherwise
where denotes the dual space of . It is obvious from this definition that if and only if is a minimizer of . Since we deal with several spaces, namely, , it will turn out to be useful to distinguish sometimes in which space (and associated topology) the subdifferential is defined by imposing a subscript for the subdifferential considered on the space .
Examples \thetheorem
1. Let and is the norm. We have
(9) 
where if and .
2. Assume and is a proper lowersemicontinuous convex function. For , we define the function
which is called the proximity map in the convex analysis literature, e.g., [26, 18], and generalized thresholding in the signal processing literature, e.g., [19, 20, 21, 29]. Observe that by the function is coercive in and by lowersemicontinuity and strict convexity of the term this definition is wellposed. In particular, is the unique solution of the following differential inclusion
It is wellknown [26, 37] that the proximity map is nonexpansive, i.e.,
In particular, if is a 1homogeneous function then
where is a suitable closed convex set associated to , see for instance [18].
Under the notations of Definition 4.1, we consider the following problem
(10) 
where is a bounded linear operator on . We have the following useful result.
4.2 Oblique thresholding
We want to exploit Theorem 4.1 in order to produce an algorithmic solution to each iteration step (8). {theorem}[Oblique thresholding] For and for the following statements are equivalent:

;

there exists such that .
Moreover, the following statements are equivalent and imply (i) and (ii).

there exists such that ;

there exists such that .
{proof} Let us show the equivalence between (i) and (ii). The problem in (i) can be reformulated as
The latter is a special instance of (10). Moreover, is continuous on in the normtopology of (while in general it is not on with the norm topology of ). Recall now that is assumed to be a bounded and surjective map with closed range in the normtopology of (see Section 2.2). This means that is injective and that is closed. Therefore, by an application of Theorem 4.1 the optimality of is equivalent to the existence of such that
Due to the continuity of in , we have, by [26, Proposition 5.6], that
Thus, the optimality of is equivalent to
This concludes the equivalence of (i) and (ii). Let us show now that (iii) implies (ii). The condition in (iii) can be rewritten as
Since is 1homogeneous and lowersemincontinuous, by Examples 4.1.2, the latter is equivalent to
or, by (H3),
The latter optimal problem is equivalent to
Since we obtain that (iii) implies (ii). We prove now the equivalence between (iii) and (iv). We have
By applying to both sides of the latter equality we get
By recalling that , we obtain the fixed point equation
(11) 
Conversely, assume for some . Then
Remark \thetheorem
1. Unfortunately in general we have which excludes the complete equivalence of the previous conditions (i)(iv). For example, in the case and , , , , we have , hence, . It can well be that . However, since in this case, we have and therefore we may choose any in . Following [29], is assumed to be the result of softthresholded iterations, hence is a finitely supported vector. Therefore, by Examples 4.1.1, we can choose to be also a finitely supported vector, hence . This means that the existence of as in (iii) or (iv) of the previous theorem may occur also in those cases for which . In general, we can only observe that is weaklydense in .
2. For with finite dimension – which is the relevant case in numerical applications – all the spaces are independent of the particular attached norm and coincide with their duals, hence all the statements (i)(iv) of the previous theorem are equivalent in this case.
A simple constructive test for the existence of as in (iii) or (iv) of the previous theorem is provided by the following iterative algorithm:
(12) 
The following statements are equivalent:

there exists such that (which is in turn the condition (iv) of Theorem 4.2)
In particular, there are no fixed points of (11) if and only if , for .
For the proof of this Proposition we need to recall some classical notions and results. {definition} A nonexpansive map is strongly nonexpansive if for bounded and we have
[Corollaries 1.3, 1.4, and 1.5 [9]] Let be a strongly nonexpansive map. Then if and only if converges weakly to a fixed point for any choice of .
(Proposition 4.2) Orthogonal projections onto convex sets are strongly nonexpansive [8, Corollary 4.2.3]. Moreover, composition of strongly nonexpansive maps are strongly nonexpansive [9, Lemma 2.1]. By an application of Proposition 4.2 we immediately have the result, since any map of the type is strongly nonexpansive whenever is (this is a simple observation from the definition of strongly nonexpansive map). Indeed, we are looking for fixed points of or, equivalently, of .
In Examples 4.1, we have already observed that