Multigrid preconditioning of linear systems for semismooth Newton methods applied to optimization problems constrained by smoothing operators

Multigrid preconditioning of linear systems for semismooth Newton methods applied to optimization problems constrained by smoothing operators

Andrei Drăgănescu This work was supported by the Department of Energy under contract no. DE-SC0005455 Department of Mathematics and Statistics, University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, Maryland 21250 (draga@umbc.edu).    and by the National Science Foundation under award DMS-1016177. Computations were performed in part on the UMBC’s High Performance Computing Facility which was supported in part by the National Science Foundation under awards CNS-0821258 and DMS-0821311.
Abstract

This article is concerned with the question of constructing efficient multigrid preconditioners for the linear systems arising when applying semismooth Newton methods to large-scale linear-quadratic optimization problems constrained by smoothing operators with box-constraints on the controls. It is shown that, for certain discretizations of the optimization problem, the linear systems to be solved at each semismooth Newton iteration reduce to inverting principal minors of the Hessian of the associated unconstrained problem. As in the case when box-constraints on the controls are absent, the multigrid preconditioner introduced here is shown to increase in quality as the mesh-size decreases, resulting in a number of iterations that decreases with mesh-size. However, unlike the unconstrained case, the spectral distance between the preconditioners and the Hessian is shown to be of suboptimal order in general.

m

ultigrid, semismooth Newton methods, optimization with PDE constraints, large-scale optimization

{AMS}

65K10, 65M55, 65M32, 90C06

1 Introduction

The objective of this article is to develop efficient multigrid preconditioners for the linear systems arising in the solution process of large-scale optimal control problems constrained by partial differential equations (PDEs) using semismooth Newton methods (SSNMs). The model problems under scrutiny have the form

 {\vspace4ptminimizeJβ(u)\lx@stackreldef=12||Ku−yd||2+β2||u||2,  β>0 fixed\vspace4ptsubject to: a≤u≤b  a.e., (1)

where is a bounded, open set, is a linear compact operator in the Hilbert space , and are given functions so that for almost all . More detailed conditions will be given in Section 2.

In the PDE-constrained optimization literature  appears oftentimes as the composition of two operators , where is the compact embedding of a space  into , and is the solution operator of a linear PDE: if defines the linear PDE

 e(y,u)=0 ,

then if and only if . This way we obtain an equivalent formulation of (1) that has become standard in the PDE-constrained optimization literature [17]:

 {\vspace6ptminimize  12||y−yd||2+β2||u||2subject to: e(y,u)=0, u∈Uad={u∈U : a≤u≤b  a.e.} . (2)

In this formulation is the control and we shall call the state. We shall also refer to (1) as the reduced form of (2). There are many applications, however, for which the formulation (1) may not correspond to a problem of the form (2), as is the case when is an explicitly integral operator as in image deblurring. In other instances it may simply be that the reduced form (1) is more natural than (2). For example, in the solution process of the backwards parabolic equation discussed in [7] (see also the related work [1]) represents the time- solution of a linear parabolic equation with initial value ; hence, the purpose of the optimization problem is to find an initial value for which the time- solution is close to a given state . While this problem can be formulated as a PDE-constrained optimization (2) with space-time constraints, in a truly large-scale, four-dimensional setting the three-dimensional reduced form (1) of the optimization problem is more trackable. We should remark that both for the image-deblurring and the backwards heat-equation examples the operator can be discretized at various resolutions, and the adjoint can be computed at a cost comparable to that of . Hence, in this work we focus on the reduced problem (1).

Due to the availability of increasingly powerful parallel computers, the scientific community has shown a growing interest over the last decade in developing scalable solvers for large-scale optimization problems with PDE constraints. Multigrid methods have long been associated with large-scale linear systems, the paradigm being that the solution process can be significantly accelerated by using multiple resolutions of the same problem. However, the exact embodiment of the multigrid paradigm depends strongly on the class of problems considered, with multigrid methods for differential equations (elliptic, parabolic, flow problems) being significantly different from methods for integral equations. To place our problem in the context of multigrid methods we consider the simplified version of (1) obtained by removing the inequality constraints on the control, e.g. , case in which (1) reduces to the linear system

 (K∗K+βI)u=K∗yd , (3)

which represents the normal equations associated with the Tikhonov regularization of the ill-posed problem

 Ku=yd . (4)

Beginning with the works of Hackbusch [11] (see also [13]) much effort has been devoted to developing efficient multigrid methods for solving equations like (3) and (4), e.g. see [20, 21, 14, 19, 2, 7] and the references therein. For example, Drăgănescu and Dupont [7] have constructed a multigrid preconditioner which satisfies

 (5)

where is the mesh-size, is the discretized version of , and is the order of the discretization ( for piecewise linear finite elements). We regard (5) as optimal-order scalability since it implies that the condition number of the unpreconditioned system is being reduced by a factor of in the -preconditioned system, namely ; this reduction is of optimal order given that the discretization order is . A similar situation is encountered in classical multigrid for elliptic problems where multigrid is used to reduce the condition number from to , the latter implying the desired mesh-independence property. We should point out that (5) implies that the number of iterations actually decreases with to the point where, asymptotically, only one iteration is needed on very fine meshes.

The presence of explicit box constraints on the controls and/or states in PDE-constrained optimization problems is sometimes critical both for practical (design constraints) and theoretical reasons (e.g. states representing densities or concentrations of substances that have to be nonnegative). Methods for solving optimization problems with inequality constraints are fundamentally different and more involved than those for unconstrained problems; they generally fall into two competing categories: interior point methods (IPMs) and active-set methods such as SSNMs. Both types of methods exhibit superlinear local convergence and can be formulated and analyzed both in finite dimensional spaces as well as in function spaces [25, 27, 28], the latter being a critical step towards proving mesh-independence for the number of optimization steps. Both IPMs and SSNMs are iterative procedures that require the equivalent of a few PDE solves, i.e., applications of , for each iteration (called here outer iteration) and the solution one or two inner linear systems; for SSNMs only one inner linear solve is required, while for Mehrotra’s predictor-corrector IPM implementation two inner linear solves are needed for each outer iteration. Efficiency of the solution process is measured by the number of outer iterations (ideally mesh-independent) needed to solve the problem to a desired tolerance and by the ability to solve the inner linear systems efficiently. In this work we concentrate on the latter.

Even though SSNMs and IPMs essentially solve the same problem, the linear algebra requirements for IPMs are different from those of SSNMs. If formulated in the reduced form (1), that is, with the PDE constraints eliminated (e.g. when the application of  is treated as a black-box) the structure of the systems arising in the IPM solution process is shown to be similar to the system (3) for the unconstrained problem [9]. More precisely, for IPMs we need to solve systems of the form

 Gh,λu\lx@stackreldef=(K∗K+Dλ)u=b , (6)

where is the multiplication operator with a relatively smooth function . Moreover, under specific conditions and for a natural discrete formulation of (1), Drăgănescu and Petra [9] have constructed multigrid preconditioners for the linear systems (6) that exhibit a certain degree of optimality similar to the one multigrid preconditioners in [7]: the resulting multigrid preconditioner for the satisfies

 1−Ch2β||λ−12||W2∞≤⟨Gh,λu,u⟩⟨Mhu,u⟩≤1+Ch2β||λ−12||W2∞,   ∀u≠0 . (7)

We recognize in (7) the optimal-order -term (linear splines were used for discretization), but also remark that the quality of the preconditioner normally is affected by the lack of smoothness of : in general, as the solution approaches the boundary, the smoothness of is expected to degrade.

In this article we use similar ideas to design preconditioners for the linear systems arising in the SSNM solution process. Our main contribution is two-fold: we construct suitable coarse spaces and transition operators, and we give a detailed analysis of the resulting two- and multigrid preconditioners. We note that the basic elements of our analysis are related to those in [9], so in this sense the present work could be regarded as a companion of [9]. However, we should point out that the algorithms for SSNMs are essentially different from those for IPMs, and so are the structures of their analyses. For SSNMs we will show that the linear systems to be solved are essentially principal subsystems of (3) where the selected rows (and columns) correspond to the constraints that are deemed inactive at some point in the solution process. The constructed multigrid preconditioner is shown to essentially satisfy (5) with . While this order of approximation is clearly suboptimal, it still brings a significant reduction of the condition number if , and still results in a solution process that requires fewer and fewer inner linear iterations as .

The method developed and analyzed in this article is related to the multigrid method of the second kind developed by Hackbusch (see [12], Ch. 16). In fact, when using the multigrid iteration described in [12] in connection with the coarse spaces and transition operators described here we observe that the number of inner linear iterations decreases as . However, our numerical experiments in Section 5 indicate that our method is more efficient in absolute terms. We should also note that the coarse spaces defined in this work are related to the symmetric multigrid preconditioner developed by Hoppe and Kornhuber in [18] for obstacle problems, where the matrices to be preconditioned are subsystems of elliptic operators.

While our strategy is mainly designed for the reduced form (1), a significant literature is devoted to multigrid methods applied to the complementarity problem representing the Karush-Kuhn-Tucker (KKT) system of (2). Of these techniques we mention the collective smoothing multigrid method of Borzì and Kunisch [3]. For further references we refer the reader to the review article of Borzì and Schulz [4]. Also, in a recent article Stoll and Wathen [22] develop preconditioners for the linear systems arising in the SSNMs solution process of the unreduced optimal control problem (2); in their approach the linear systems are indefinite (since they correspond to derivatives of Lagrangians) and sparse, since it is not , but that is explicitly present in the system. Yet another alternative strategy for solving the linear systems for SSNMs applied to the unreduced problem (2), presented by Ulbrich in [25], p. 219ff (see also [24]) involves reducing the linear systems to solving the discrete PDEs for which efficient solvers are assumed to be readily available (such as classical multigrid for elliptic problems). The question of which method is the most efficient is difficult to answer for a general setting. However, we emphasize that the technique proposed in this article will work when the operator is given only as a black-box, or when solvers are available for computing efficiently.

This article is organized as follows: in Section 2 we give the formal introduction of the problem, briefly discuss SSNMs, and present the main results. Section 3 is essentially devoted to proving the main result, Theorem 2.4.3, concerning the two-grid preconditioner, while in Section 4 we extend the analysis to the multigrid preconditioner. In Section 5 we show some numerical results to support our theoretical work, and we formulate some conclusions in Section 6. Appendix A contains some technical results used in Section 4.

2 Problem formulation and main results

Our solution strategy follows the discretize-then-optimize paradigm, where we first formulate a discrete optimization problem associated with (1), which we then solve using SSNMs. After introducing the discrete framework in Section 2.1, we discuss the optimality conditions and their semismooth formulation in Section 2.2. In Section 2.3 we derive the linear systems needed to be solved at each SSNM iteration. The two-grid preconditioner and main two-grid results are given in Section 2.4. Furthermore, we discuss the multigrid preconditioner in Section 2.5.

2.1 Notation and discrete problem formulation

Let ( or ) be a bounded domain which, for simplicity, we assume to be polygonal (if ) or polyhedral (for ). We denote by (with ) the standard Sobolev spaces, and by and the -norm and inner product, respectively. Let be the dual (with respect to the -inner product) of for , with the norm given by

 ||u||˜H−m(Ω)=supv∈Hm(Ω)∩H10(Ω)⟨u,v⟩/||v||Hm(Ω) .

The space of bounded linear operators on a Banach space is denoted by . We regard square matrices as operators in and we write matrices and vectors using bold font. If is a symmetric positive definite matrix, we denote by the -dot product of two vectors , and by the -norm; if we drop the subscript from the inner product and norm. The space of matrices is denoted by ; if we write instead of . Given some norm on a vector space , and , we denote by the induced operator-norm

 ||T||s=supu∈X, ||u||s=1||Tu||s .

Consequently, if then (no subscripts) is the operator-norm of . If is a Hilbert space and then denotes the adjoint of . The defining elements of the discrete optimization problem are: the discrete analogues of , discrete norms, and discrete inequality constraints, all of which we introduce below.

To discretize the optimal control problem (1) we consider a sequence of quasi-uniform (in the sense of [6]) meshes , which we assume to be either simplicial (triangular if , tetrahedral if ) or rectangular, and let

 hj=max{diam(T) : T∈Tj} ,  j=0,1,2,… .

It is assumed that there are mesh-independent constants (usually ) so that

 f––≤hj/hj−1≤¯¯¯f .

We define the standard finite element spaces: for simplicial elements let

 Vsj={u∈C(¯¯¯¯Ω) : ∀T∈Tj,  u|T is linear,  u|∂Ω≡0} ,

and for rectangular we use piecewise tensor-products of linear polynomials

 Vrj={u∈C(¯¯¯¯Ω) : ∀T∈Tj,  u|T ∈Q1,  u|∂Ω≡0} ,

where

 Q1={∑jcjd∏k=1lj,k(xk) : lj,k linear polynomial of one variable} .

For simplicity we assume that is a uniform refinement of so the associated spaces are nested

 Vj⊂Vj+1⊂H10(Ω) .

Since the algorithms and results are the same for both types of finite element spaces we will denote by either or . Let and the nodes of that lie in the interior of , and define to be the standard interpolation operator

 Ij(u)=Nj∑i=1u(P(j)i)φ(j)i ,

where are the standard nodal basis functions. If we replace exact integration on an element with vertices by the cubature

 ∫Tf(x)dx≈vol(T)ν∑P vertex of Tf(P) ,

then the -inner product is approximated by the mesh-dependent inner product

 ⟨u,v⟩j=Nj∑i=1w(j)iu(P(j)i)v(P(j)i),  for u,v∈Vj ,

where

 w(j)i=ν−1∑P(j)i vertex of Tvol(T) . (8)

The discrete norms are then given by

 |||u|||j\lx@stackreldef=√⟨u,u⟩j .

Since the quadrature/cubature is exact for linear functions, or tensor-products of linear functions, respectively, we have

Moreover, due to quasi-uniformity, there exist positive constants independent of such that

 C1||u||≤|||u|||j≤C2||u||, ∀u∈Vj . (9)

We should point out that the norm-equivalence (9) extends to show mesh-independent equivalence of the associated operator-norms. We say that the weights are uniform with respect to the mesh if there exists independent of  so that

 w(j)i=ωjhd  for  i=1,…,Nj .

We call a triangulation locally symmetric if for every vertex the associated nodal basis function is symmetric with respect to the reflection in , that is,

 φ(2P−x)=φ(x),  ∀x∈Ω .

If a mesh is uniform, then it is locally symmetric and the weights are uniform.

On each space consider an operator representing a discretization of . For the discrete operators we denote to be the adjoint of with respect to , that is,

 ⟨K∗ju,v⟩j=⟨u,Kjv⟩j, ∀u,v∈Vj .

We assume that the operators satisfy the following condition.

Condition \thetheorem

There exists a constant depending on and independent of so that the following hold:

1. smoothing:

 max(||Ku||Hm(Ω),||K∗u||Hm(Ω))≤C||u||,  ∀u∈L2(Ω), m=0,1,2 ; (10)
2. smoothed approximation:

 ||Ku−Kju||Hm(Ω)≤Ch2−mj||u||,  ∀u∈Vj, m=0,1, j≥0 ; (11)
3. uniform boundedness of discrete operators and their adjoints:

 max(||K∗ju||L∞(Ω),||Kju||L∞(Ω))≤C||u||,  ∀u∈Vj,  j≥0 . (12)

We now formulate the discrete optimization problem using the discrete norms and we enforce the inequality-constraints at the vertices:

 (13)

where the vectors (resp.,) represent -projections of the functions (resp., ). onto .

If is the matrix representation of in the nodal basis, is the diagonal matrix with diagonal entries , and with , defined earlier, the problem (13) reads in matrix form

 ⎧⎨⎩\vspace7ptminimizeJj,β(u)\lx@stackreldef=12∣∣Kju−yd∣∣2Wj+β2|u|2Wj,  β>0 fixed\vspace4ptsubject to: u∈RNj, a(j)≤u≤b(j) . (14)

Furthermore, we write , where

 Cj=KTjWjKj+βWj,  fj=KTjWjyd,  γj=12yTdWjyd . (15)

We also point out that the adjoint operator is represented by the matrix , so we denote

 K∗j\lx@stackreldef=W−1jKTjWjKj .

2.2 Optimality conditions and SSNMs

Since is strictly convex and quadratic, (1) has a unique solution satisfying the KKT conditions (e.g. see [17, 23]): there exist so that

 ⎧⎪⎨⎪⎩∇Jβ+λb−λa=0 ,a−u≤0, λa≥0, (a−u)λa=0  a.e.,u−b≤0, λb≥0, (u−b)λb=0  a.e. (16)

After denoting , the complementarity system (16) can be written as a nonlinear, non-smooth system

 (17)

where is an arbitrary constant, and . Since , for the system (17) is equivalent to

 ∇Jβ−max(0,K∗(Ku−yd)+βa)−min(0,K∗(Ku−yd)+βb)=0 . (18)

Cf. [26], if with , then Condition 2.1 implies that the nonlinear function in (18) is semismooth (slantly differentiable); therefore Newton’s method converges superlinearly when applied to (18).

To simplify notation, for the remainder of this section we omit the sub- or superscripts ‘’ if there is no danger of confusion; hence , , , , etc. The KKT and associated semismooth equation for (14) is similar to the continuous case, namely the solution satisfies the following complementarity problem: there exist vectors so that

 ⎧⎪⎨⎪⎩∇Jβ+λb−λa=0 ,a−u≤0, λa≥0, (a−u)⋅λa=0 ,u−b≤0, λb≥0, (u−b)⋅λb=0 , (19)

where is the componentwise vector multiplication. Since and is diagonal, after denoting and left-multiplying the first equation in (19) by , the complementarity system (19) is written as a nonlinear, non-smooth system

 {(K∗K+βI)u−K∗yd−^λ=0 ,^λ−max(0,^λ+σ(a−u))−min(0,^λ+σ(b−u))=0 , (20)

since . For , (20) is equivalent to

 {^λ=(K∗K+βI)u−K∗yd ,^λ−max(0,K∗(Ku−yd)+βa)−min(0,K∗(Ku−yd)+βb)=0 . (21)

While the resemblance between the discrete system (21) and its continuous counterpart (18) extends beyond notation, it is neither evident nor is it the purpose of this paper to prove mesh-independence of the convergence rate of Newton’s method for (21); however, we do observe mesh-independence in numerical computations even for . We should also note that the mesh-independence results of Hintermüller and Ulbrich in [16] do not apply directly to (21) mainly because here we discretize the control using continous piecewise linear functions rather than piecewise constants as in [16], so the operator is not well defined in : if has both negative and positive values on an element , then is no longer in . This is the main reason for which we resort to a matrix formulation of the discrete optimization problem (13) prior to formulating a discrete semismooth Newton system, instead of formulating a semismooth Newton system directly for finite element spaces. However, our primary interest is to construct multigrid preconditioners for the linear systems arising in the solution process of (21) which we discuss in the next section.

2.3 Primal-dual formulation and linear systems

As shown in [15], the semismooth Newton method for (14) can be regarded as a primal-dual active set method which we now describe briefly. If , solve (20), then we define the following sets:

 I = {i∈{1,…,N} : ^λi+σ(ai−ui)<0  and  ^λi+σ(bi−ui)>0} , Aa = {i∈{1,…,N} : ^λi+σ(ai−ui)≥0  and  ^λi+σ(bi−ui)>0} , Ab = {i∈{1,…,N} : ^λi+σ(ai−ui)<0  and  ^λi+σ(bi−ui)≤0} .

A simple argument shows that if , then and ; on the other hand, if , then and ; finally, if , then and . Also, since for all , we have . The primal-dual active set method produces a sequence of sets that approximate . Given , we set the system

 ⎧⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪⎩\vspace5pt(K∗K+βI)u(k+1)−^λ(k+1)=K∗yd ,\vspace5ptu(k+1)i=ai,  for  i∈Aak,   u(k+1)i=bi,  for  i∈Abk ,\vspace5pt^λ(k+1)i=0,  for  i∈Ik . (22)

The new set-iterates are then given by

 Ik+1 = {i : ^λ(k+1)i+σ(ai−u(k+1)i)<0  and  ^λ(k+1)i+σ(bi−u(k+1)i)>0} , Aak+1 = {i : ^λ(k+1)i+σ(ai−u(k+1)i)≥0  and  ^λ(k+1)i+σ(bi−u(k+1)i)>0} , Abk+1 = {i : ^λ(k+1)i+σ(ai−u(k+1)i)<0  and  ^λ(k+1)i+σ(bi−u(k+1)i)≤0} .

Now consider the splitting of according to the sets of indices and . More precisely, using MATLAB syntax, we define

 Gin,k=G(Ik,Ik) ,  Gia,k=G(Ik,Ak),  Gai,k=G(Ak,Ik),  Gaa,k=G(Ak,Ak) .

Since is determined for , and for , the equations corresponding to indices from in the first system of (22) reduce to

 Gin,ku(k+1)in=(K∗yd)in−Gia,ku(k+1)a , (23)

where , , . After solving (23), the remaining components of are identified by

 ^λ(k+1)(Ak+1)=Gai,ku(k+1)in+Gaa,ku(k+1)a−(K∗yd)(Ak+1) .

Thus the critical step in solving (22) is the reduced system (23).

We should point out that for large-scale problems the matrices , and thus also  and , are expected to be dense and are not formed. Therefore we can solve (23) only by means of iterative solvers. Hence, efficient matrix-free preconditioners are necessary for accelerating the solution process. Also note that if , then the multigrid preconditioners developed in [7] can be used. Our main contribution in this work is the design of two- and multigrid preconditioners for (23) for the case when .

2.4 The two-grid preconditioner

Let be a fixed level, to which we shall refer as the fine level. We will first define a two-grid preconditioner for the matrix of (23) that will involve inverting a certain principal minor of . To design the two-grid preconditioner we will regard the matrix as an operator between finite element spaces.

2.4.1 Construction of the two-grid preconditioner

To simplify notation, consider a fixed index set ; this should be regarded as one of the iterates encountered in the solution process described in the previous section. The linear system we investigate has the form

 Ginj~u=~r , (24)

where We will call vertex or an index inactive if . Define the fine inactive space by

 Vinj=span{φ(j)i : i∈I(j)} ,

and the fine inactive domain by

 (25)

where is the support of the function . The critical component of the preconditioner is the definition of the coarse inactive index set:

 I(j−1)\lx@stackreldef={i∈{1,…,Nj−1} : supp(φ(j−1)i)⊆Ωinj} . (26)

To be more precise, if is the index in the fine numbering associated to the coarse index , that is, , the definition above is equivalent to saying that if together with all its neighbouring fine indices are inactive. In Figure 1 we depict a set of inactive fine nodes by filled circles, and the associated coarse inactive nodes by hollow circles. The coarse nodes that are not inactive are shown with a square hollow marker. The coarse inactive space is now defined to be

 Vinj−1=span{φ(j−1)i : i∈I(j−1)} , (27)

and the coarse inactive domain is given by

 Ωinj−1=⋃i∈I(j−1)supp(φ(j−1)i) .

Note that and . In connection with we also define numerical interior of (relative to ) to be the union of all coarse elements included in the fine inactive domain, that is, , and whose vertices are either in or lie on the boundary of (see Figure 1). Furthermore, let the numerical boundary of (relative to ) be given by

 ∂nΩinj=Ωinj∖IntnΩinh .

Note that .

Let now be the -orthogonal complement of in and define the -projectors

 πinj−1:Vinj→Vinj−1 ,    ρinj−1:Vinj→Winj−1 ,

so is the identity on . Furthermore, let be the natural embedding obtained by extending a function with zero outside of . We may occasionally omit the explicit use of the embedding operators to ease notation. We also define the restriction as the adjoint with respect to to the embedding of in , that is, for

 ⟨u,v⟩j=⟨Rj−1u,v⟩j−1,  ∀v∈Vj−1 , (28)

and let the projection with respect to given by

 Pinj⎛⎝Nj∑i=1uiφ(j)i⎞⎠=∑i∈I(j)uiφ(j)i .

Furthermore, denote by the orthogonal -projector onto the space . From the equivalence (9) of the discrete norms with the -norm it follows that

 ||Rju||≤C||u|| , j=0,1,…, . (29)

for some mesh-independent constant .

The matrix in (24) represents the operator

 (30)

and thus matrix-vector products for