Optimal-order preconditioners for linear systems arising in the semismooth Newton solution of a class of control-constrained problems

# Optimal-order preconditioners for linear systems arising in the semismooth Newton solution of a class of control-constrained problems

Andrei Drăgănescu Department of Mathematics and Statistics, University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, Maryland 21250 (draga@umbc.edu). This material is based upon work supported by the U.S. Department of Energy Office of Science, Office of Advanced Scientific Computing Research, Applied Mathematics program under Award Number DE-SC0005455, and by the National Science Foundation under awards DMS-1016177 and DMS-0821311.    Jyoti Saraswat Department of Mathematics and Physics, Thomas More College, Crestview Hills, Kentucky 41017 (saraswj@thomasmore.edu).
###### Abstract

In this article we present a new multigrid preconditioner for the linear systems arising in the semismooth Newton method solution of certain control-constrained, quadratic distributed optimal control problems. Using a piecewise constant discretization of the control space, each semismooth Newton iteration essentially requires inverting a principal submatrix of the matrix entering the normal equations of the associated unconstrained optimal control problem, the rows (and columns) of the submatrix representing the constraints deemed inactive at the current iteration. Previously developed multigrid preconditioners for the aforementioned submatrices were based on constructing a sequence of conforming coarser spaces, and proved to be of suboptimal quality for the class of problems considered. Instead, the multigrid preconditioner introduced in this work uses non-conforming coarse spaces, and it is shown that, under reasonable geometric assumptions on the constraints that are deemed inactive, the preconditioner approximates the inverse of the desired submatrix to optimal order. The preconditioner is tested numerically on a classical elliptic-constrained optimal control problem and further on a constrained image-deblurring problem.

m

ultigrid, semismooth Newton methods, optimization with PDE constraints, large-scale optimization, image deblurring

{AMS}

65K10, 65M55, 65M32, 90C06

## 1 Introduction

The goal of this work is to construct optimal order multigrid preconditioners for optimal control problems of the type

 minu∈U12||Ku−yd||2+β2||u||2 ,  a⩽u⩽b,  a.e., (1)

where with a bounded domain, is given, and is a linear continuous operator with being a compactly embedded subspace of . The parameter is used to adjust the size of the regularization term . Throughout this article denotes the -norm or the operator-norm of a bounded linear operator in . The functions defining the inequality constraints in (1) satisfy for all . These problems arise in the optimal control of partial differential equations (PDEs), case in which represents the solution operator of a PDE. For example, the classical PDE-constrained optimization problem

 ⎧⎪ ⎪⎨⎪ ⎪⎩\vspace5ptmin  12||y−yd||2+β2||u||2\vspace5ptsubject to: −Δy=u  in Ω,  y=0  on ∂Ω,a⩽u⩽b  a.e., (2)

reduces to (1) when replacing in the cost functional of (2), where . A related problem, discussed in [8], addresses the question of time-reversal for parabolic equations, a problem that is ill-posed. In this example we set , where is the time- solution operator of a linear parabolic PDE with initial value , and is a fixed time. If the solution of the inverse problem needs to satisfy certain inequality constraints, e.g., when , perhaps representing the concentration of a substance, is required to have values in , then it is essential to impose these constraints explicitly in the formulation of the optimization problem, as shown in (1). For obvious reasons, in the PDE-constrained optimization literature (1) is referred to as the reduced problem. For other applications, such as image deblurring, can be an explicitly defined integral operator

 Ku\lx@stackreldef=∫Ωk(⋅,x)u(x)dx ,

with ; here is the original image and is the blurred image. Thus, by solving (1) we seek to reconstruct the image whose blurred version is a given , subject to additional box constraints.

We give a few references to works on multigrid methods for solving (1) with no inequality constraints. In this case (1) is equivalent to the Tikhonov regularization of the ill-posed problem , which in turn reduces to the linear system

 (K∗K+βI)u=K∗yd, (3)

representing the regularized normal equations of . A significant literature [17, 20, 11, 16, 8, 2, 9], to mention just a few references, is devoted to multigrid methods for (3) or the unregularized ill-posed problem. Moreover, when is the solution operator of a linear PDE, an alternative strategy is to solve directly the indefinite systems representing the Karush-Kuhn-Tucker (KKT) optimality conditions instead of the reduced system, and many works [3, 18, 22, 23, 24] are concerned with multigrid methods for PDE-constrained optimization problems in unreduced form. A comprehensive discussion of the latter strategy is found in [4].

The presence of bound-constraints in (1) brings additional challenges to the solution process, since the KKT optimality conditions form a complementarity system as opposed to a linear or a smooth nonlinear system. As shown by Hintermüller et al. [13], the KKT system can be reformulated as a semismooth nonlinear system for which a superlinearly convergent Newton’s method can be devised – the semismooth Newton method (SSNM). Moreover, with controls discretized using piecewise constant finite elements, Hintermüller and Ulbrich [14] have shown that the SSNM converges in a mesh-independent number of iterations for problems like (1), so it is a very efficient solution method in terms of number of optimization iterations. A comprehensive discussion of SSNMs can be found in [25]. However, as with Newton’s method, each SSNM iteration requires the solution of a linear system, and the efficiency of the SSNM depends on the availability of high quality preconditioners for the linear systems involved. Naturally, the question of devising preconditioners for SSNMs has received a lot of attention in recent years, especially in the context of optimal control problems constrained by PDEs, e.g, see [12, 1, 19], where preconditioners are primarily targeting the sparse and indefinite KKT systems arising in the solution process. For problems formulated as (1), the SSNM solution essentially requires inverting at each iteration a principal submatrix of the matrix representing a discrete version  of , where denotes the mesh size. The multigrid preconditioner developed by Drăgănescu and Dupont in [8] for the operator arising in the unconstrained problem (3) is shown, under reasonable conditions, to be of optimal order with respect to the discretization: namely, if we denote by  the multigrid preconditioner (thought of as an approximation of ), then

 1−Chpβ⩽⟨Shu,u⟩⟨(Hh)−1u,u⟩⩽1+Chpβ,  ∀u∈Uh∖{0}, (4)

where is the convergence order of the discretization and is the discrete control space; for continuous piecewise linear discretizations we have . A natural extension of the ideas in [8] led to the suboptimal multigrid preconditioner developed by Drăgănescu in [7] for principal submatrices of , where is shown to essentially be  for a piecewise linear discretization. The key aspect of defining the multigrid preconditioners for principal submatrices of  is the definition of the coarse spaces. The natural domain of a principal submatrix of , thought as an operator, is a subspace of . The multigrid preconditioner developed in [7] is based on constructing coarse spaces that are subspaces of , i.e., conforming coarse spaces. A similar strategy was used by Hoppe and Kornhuber [15] in devising multilevel methods for obstacle problems. However, a visual inspection of the eigenvectors corresponding to the extreme joint eigenvalues of and in [7] suggests that the multigrid preconditioner is suboptimal precisely because of the conformity of the coarse spaces. Thus in this work we defined a new multigrid preconditioner based on non-conforming coarse spaces. While the new construction is limited to piecewise constant approximations of the controls, we can show that, under reasonable conditions, the approximation order of the preconditioner in (4) is . It also turns out that the analysis of the new preconditioner is quite different from the analysis in [7]; fortunately it is also simpler.

This article is organized as follows: in Section 2 we give a formal description of the problem and we briefly describe the SSNM to justify the necessity of preconditioning principal submatrices of . Section 3 forms the core of the article; here we introduce and analyze the two-grid preconditioner, the main result being Theorem 3. In Section 4 we extend the two-grid results to multigrid; this section follows closely the analogue extension in [7] with certain modifications required by the non-conforming coarse spaces. In Section 5 we show numerical experiments conducted on two test problems: the elliptic constrained problem (2) and the box-constrained image deblurring problem. We formulate a set of conclusions in Section 6. We included in Appendix A a convergence analysis of a Gaussian blurring operator that may also be of independent interest.

## 2 Problem formulation

To fix ideas we assume the compactly embedded space to be and to be polygonal or polyhedral. We denote by and we use the convention . We also define the -norm by

 ||u||−1\lx@stackreldef=supv∈V∖{0}|⟨u,v⟩|||v||1 ,

where denotes the -inner product. We focus on a discrete version of (1) obtained by discretizing the continuous operator using piecewise constant finite elements, as considered by Hintermüller and Ulbrich in [14]. Let be a family of shape-regular, nested triangulations of with with being the mesh-size of the triangulation , and assume that

 flow⩽hj+1/hj⩽fhigh (5)

for some independent of ; for example, if a uniform mesh refinement leads to . Furthermore, let be the space of continuous piecewise constant functions with respect to . Since the triangulations are nested, we have for all . We assume that a family of discretizations , is given, where is a finite element space with properties specified below, and represents a discrete version of . Note that it is not assumed that be a subspace of . The discrete optimization problem under scrutiny is

 minu∈UjJβj(u)\lx@stackreldef=12||Kju−y(j)d||2+β2||u||2 ,  a(j)⩽u⩽b(j)  a.e., (6)

where are discrete functions representing , and . As in [10], we assume that the operators satisfy the smoothed approximation condition (SAC) appended by - stability of :

###### Condition \thetheorem (Sac)

There exists a constant depending on and independent of so that

1. smoothing:

 max(||K∗u||m,||Ku||m)⩽C1||u||,  ∀u∈L2(Ω), m=0,1 ; (7)
2. smoothed approximation: for

 ||Ku−Kju||⩽C1hj||u||,  ∀u∈Uj; (8)
3. - stability: for

 ||Kju||L∞(Ω)⩽C1||u||,  ∀u∈Uj. (9)
###### Remark \thetheorem

A simple consequence of Condition 2 is that

 ||Ku||⩽C1||u||−1 ,  ∀u∈U . (10)
{proof}

Indeed, we have

 ||Ku||2=⟨u,K∗Ku⟩⩽||u||−1⋅||K∗Ku||1⩽C1||u||−1⋅||Ku|| ,

and (10) follows.

To describe the semismooth Newton method for the discrete optimization problem, we rewrite (6) in vector form. Let and be the standard piecewise constant finite element basis in . First denote by the matrix representing the operator , and let be the mass matrix in , and the mass matrix in . Note that the mass matrices are diagonal. Then (6) is equivalent to

 minu∈RNjJβj(u)\lx@stackreldef=12|Kju−y(j)d|2˜Mj+β2|u|2Mj ,  a(j)⩽u⩽b(j), (11)

where is the vector representing , and . To simplify the exposition we omit the sub- and superscripts for the remainder of this section, so that , , , etc. Similarly to (1), the discrete optimization problem (11) has a unique solution, which satisfies the KKT system

 ⎧⎪⎨⎪⎩(K∗K+βI)u+λa−λb=fa−u⩽0, λa⩾0, (a−u)⋅λa=0u−b⩽0, λb⩾0, (u−b)⋅λb=0 , (12)

where is the adjoint of with respect to the -inner product, , and are the Lagrange multipliers. The inequalities and the vector-valued product are to be understood componentwise. The fact that we are able to write the KKT system in the form (12) is not completely obvious, and it relies on the mass matrices being diagonal. It is worth noting that in [10, 7], where the controls were discretized using continuous piecewise linear functions, the mass-matrices were intentionally modified (equivalently to using a quadrature for computing -inner products) so that they be diagonal. Following [13] (see also [7]), the complementarity problem (12) can be written as the non-smooth nonlinear system

 (13)

where , . We leave it as an exercise to verify that (12) is equivalent to (13). Given the solution of (12), the following sets play a role in understanding the equivalence of (12) and (13):

 I = {i∈{1,…,N} : λi+β(ai−ui)<0  and  λi+β(bi−ui)>0} Aa = {i∈{1,…,N} : λi+β(ai−ui)⩾0  and  λi+β(bi−ui)>0} Ab = {i∈{1,…,N} : λi+β(ai−ui)<0  and  λi+β(bi−ui)⩽0} .

Assume (13) holds. If , the second equality in (13) implies that ; hence , so the constraint corresponding to the component of is inactive. Instead, if , then , so the lower constraints are active; similarly, if , then , so the upper constraints are active. So, if is the solution of (13), then is the set of indices where the constraints are inactive, is the set of indices where the lower constraints are active, while is the set of indices where the upper constraints are active.

The system (13) is in fact semismooth due to the fact that the function (from to ) is slantly differentiable [13]. Consequently, (13) can be solved efficiently using the SSNM, which is equivalent to the primal-dual active set method described below, as shown in [13]. The equivalence of the two is used to prove that the convergence is superlinear. The SSNM is an iterative process that attempts to identify the sets where the inequality constraints are active/inactive. More precisely, at the iteration, given sets partitioning , we solve the system

 ⎧⎪ ⎪ ⎪⎨⎪ ⎪ ⎪⎩\vspace5pt(K∗K+βI)u(k+1)−λ(k+1)=f ,\vspace5ptu(k+1)i=ai,  for  i∈Aak,   u(k+1)i=bi,  for  i∈Abk ,\vspace5ptλ(k+1)i=0,  for  i∈Ik . (14)

The solution is then used to define the new sets

 Ik+1 = {i : λ(k+1)i+β(ai−u(k+1)i)<0  and  λ(k+1)i+β(bi−u(k+1)i)>0} , Aak+1 = {i : λ(k+1)i+β(ai−u(k+1)i)⩾0  and  λ(k+1)i+β(bi−u(k+1)i)>0} , Abk+1 = {i : λ(k+1)i+β(ai−u(k+1)i)<0  and  λ(k+1)i+β(bi−u(k+1)i)⩽0} .

The key problem in (14) is to identify for . If we denote the Hessian of  from (11) by

 H=K∗K+βI

and partition the matrix according to the sets and

 H(k)II=H(Ik,Ik), H(k)IA=H(Ik,Ak), H(k)AI=H(Ak,Ik), H(k)AA=H(Ak,Ak),

(we used MATLAB notation to describe submatrices) and the vectors and  as

 u(k+1)I=u(Ik), u(k+1)A=u(Ak), f(k)I=f(Ik), f(k)A=f(Ak),

then is given explicitly in (14), , and satisfies

 H(k)IIu(k+1)I=f(k)I−H(k)IAu(k+1)A . (15)

The remaining components of are given by

 λ(k+1)A=H(k)AIu(k+1)I+H(k)AAu(k+1)A−f(k)A .

Therefore, the main challenge in solving (14) (which is a linear system) is in fact solving (15). The goal of this work is to construct and analyze multigrid preconditioners for the matrices appearing in (15).

## 3 The two-grid preconditioner

As in [10, 7], we start by designing a two-grid preconditioner for the principal submatrices of the Hessian arising in the SSNM solution process of (11), then we follow the idea in [8] to extend in Section 4 the two-grid preconditioner to a multigrid preconditioner of similar asymptotic quality. In this section we assume that we are solving (11) at a fixed level , and that we reached a certain iterate in the SSNM process, with a current guess at the inactive set given by . Since we are not changing the SSNM iteration we discard the sub- and superscripts , and we refer to as the “current inactive set”.

For constructing the preconditioner it is preferable to regard the matrix as a discretization of the operator

 Hj=(K∗jKj+βI)∈L(Uj)

representing the Hessian of in (6). We first define the (current) inactive space

 UIj\lx@stackreldef=span{φ(j)i : i∈I(j)}⊆Uj .

Furthermore, denote by the -projection onto and by

 ΩIj=⋃i∈I(j)supp(φ(j)i)⊂Ω

the inactive domain. The matrix represents the operator

 HIj\lx@stackreldef=πIj(K∗jKj+βI)EIj∈L(UIj), (16)

called here the inactive Hessian, where is the extension operator. Thus, our goal is to construct a two-grid preconditioner for .

The first step, and perhaps the most notable achievement in this work, is the construction of an appropriate coarse space: we define the coarse inactive space as the span of all coarse basis functions whose support intersect nontrivially, i.e.,

with

 I(j−1)\lx@stackreldef={i∈{1,…,Nj−1} : μ(supp(φ(j−1)i)∩ΩIj)>0} , (17)

where is the Lebesgue measure in . Similarly, we define the coarse inactive domain by

 ΩIj−1=⋃i∈I(j−1)supp(φ(j−1)i)⊂Ω . (18)

A few remarks are in order. First, since is a refinement of , it follows that the set is a (possibly empty) union of -elements. Since each element making up lies inside one element making up , we have the inclusion

 ΩIj⊆ΩIj−1 . (19)

Second, we do not expect in general that . However, we have

 UIj−1⊆UIj    if and only if    ΩIj−1=ΩIj . (20)

We also denote

 ∂nΩIj\lx@stackreldef=ΩIj−1∖ΩIj,

a set we call the numerical boundary of with respect to the coarse mesh. In Figures 1 and 2 we show the sets  (dark-gray) and (light-gray) for a few cases on uniform triangular grids on .

We would like to contrast the definition (17) of the coarse indices with that of Drăgănescu [7], where a coarse basis function enters the span of the coarse inactive space if ; this would define a coarse inactive space that lies inside the fine inactive space , and the inclusion (19) would be reversed, that is, the coarse inactive domain would be included in the fine inactive domain.

We now define the two-grid preconditioner by

 Mj\lx@stackreldef=πIj((HIj−1)−1πIj−1+β−1(I−πIj−1)) . (21)

The definition (21) is rooted in the two-grid preconditioner definition from [8, 7]; the difference lies the presence of the action of the projection as the last step in (21) (left-most term), which is necessary precisely because is not expected to be a subspace of . An operator related to , necessary for the analysis, is

 Sj\lx@stackreldef=πIj(HIj−1πIj−1+β(I−πIj−1)) . (22)
###### Remark \thetheorem

Both and are symmetric with respect to the -inner product, that is,

 ⟨Mju,v⟩=⟨u,Mjv⟩,   ⟨Sju,v⟩=⟨u,Sjv⟩,  ∀u,v∈UIj .

In addition, if , then .

The key to the last assertions in Remark 3 is that has no effect (hence can be discarded) when . Our ultimate goal is to estimate the spectral distance between and , as a measure of their spectral equivalence (see definition below). As an intermediate step we will estimate the spectral distance between and .

Given a Hilbert space , we denote by the set of symmetric positive definite operators in . The spectral distance between , introduced in [8] to analyze multigrid preconditioners for inverse problems like (3), is given by

 dX(A,B)=supu∈X∖{0}∣∣∣ln⟨Au,u⟩⟨Bu,u⟩∣∣∣ .

If is the smallest number for which the following inequalities hold

 1−δ⩽⟨Au,u⟩⟨Bu,u⟩⩽1+δ,  ∀u≠0,

and , then . The spectral distance not only allows to write the above inequalities in a more compact form, but some of its properties (including the fact that it is a distance function) are also used in the analysis. The main result in this article is the following theorem.

{theorem}

Assuming Condition holds, there exists constants and independent of and the inactive set so that if the following holds:

 dUIj(Mj,(HIj)−1)⩽Ctgβ−1(hj+μ(∂nΩIj)) . (23)

We postpone the proof of Theorem 3 after a few preliminary results.

###### Remark \thetheorem

Without further formalizing the argument, we would like to comment on the optimality of the result in Theorem 3. First, it should be recognized that can be for certain choices of . For example, if is a uniform refinement of in two dimensions, and contains exactly one level- subdivision of each of the level- triangles that make up , as shown in Figure 1 b, then the entire coarse space and ; thus . In this case the two-grid preconditioner is not efficient. However, if is a good approximation of the correct inactive domain , and is sufficiently regular, e.g., has a Lipschitz boundary, then we expect . It is in this sense that we regard Theorem 3 as proof of the fact that the two-grid preconditioner approximates the operator with optimal order. Figures 1 b and 2 a and b show a progression of in gray for the case when is a union of two discrete representations of disks on grids with . The ratio of the gray areas in Figure 2 a and b representing is . Furthermore, this ratio converges to as the resolution tends to zero.

The optimality result in the following lemma is a critical component for the proof of Theorem 3. {lemma} There exists a constant independent of the mesh-size and the inactive set so that

 ||(I−πIj−1)u||−1⩽C2hj||u|| ,   ∀u∈UIj , (24)

where is extended with zero outside its support. {proof} For we have

 ||(I−πIj−1)u||−1 = supv∈H10(Ω)⟨(I−πIj−1)u,v⟩||v||1.

Since ,

 ⟨(I−πIj−1)u,v⟩ = ⩽ ||(I−πIj−1)u||L2(ΩIj−1)⋅||(I−πIj−1)v||L2(ΩIj−1) ⩽ 2||u||⋅||(I−πIj−1)v||L2(ΩIj−1).

Now

 ||(I−πIj−1)v||2L2(ΩIj−1) = ∑i∈I(j−1)||(I−πIj−1)v||2L2(T(j−1)i) ⩽ ~C2h2j−1∑i∈I(j−1)|v|2H1(T(j−1)i)⩽f−2low~C2h2j|v|2H1(Ω) ,

where is the constant (uniform with respect to and due to shape-regularity) appearing in the Bramble-Hilbert Lemma on each element in with ; we also used the fact that the -projection is local for the finite element space under consideration, that is, is the average of on the element . It follows that

 ⟨(I−πIj−1)u,v⟩ ⩽f−1low~Chj||u||⋅|v|1,  ∀v∈H10(Ω),

which implies the desired result with .

###### Remark \thetheorem

It is remarkable that the constant is independent of the inactive set, and depends only on the constant appearing in the Bramble-Hilbert lemma and the refinement ratio. Also, what makes the optimal estimate (24) possible, is the inclusion . If our choice of spaces had led to , then the term to estimate would be , which is expected to be of size ; as shown in [7], the latter term is often of size .

{proposition}

Under the assumptions of Theorem there exists independent of and the fine inactive set so that, if , then

 dUIj(Sj,HIj)⩽C3hjβ . (25)
{proof}

As in Lemma 3, functions are extended with zero outside their support when necessary. We have for

 ⟨(HIj−Sj)u,u⟩ = ⟨πIj(K∗jKj−πIj−1K∗j−1Kj−1πIj−1)u,u⟩=⟨Kju,Kju⟩−⟨Kj−1πIj−1u,Kj−1πIj−1u⟩ = ||Kju||2−||Ku||2A1+||Ku||2−|