Optimalorder preconditioners for linear systems arising in the semismooth Newton solution of a class of controlconstrained problems
Abstract
In this article we present a new multigrid preconditioner for the linear systems arising in the semismooth Newton method solution of certain controlconstrained, quadratic distributed optimal control problems. Using a piecewise constant discretization of the control space, each semismooth Newton iteration essentially requires inverting a principal submatrix of the matrix entering the normal equations of the associated unconstrained optimal control problem, the rows (and columns) of the submatrix representing the constraints deemed inactive at the current iteration. Previously developed multigrid preconditioners for the aforementioned submatrices were based on constructing a sequence of conforming coarser spaces, and proved to be of suboptimal quality for the class of problems considered. Instead, the multigrid preconditioner introduced in this work uses nonconforming coarse spaces, and it is shown that, under reasonable geometric assumptions on the constraints that are deemed inactive, the preconditioner approximates the inverse of the desired submatrix to optimal order. The preconditioner is tested numerically on a classical ellipticconstrained optimal control problem and further on a constrained imagedeblurring problem.
ultigrid, semismooth Newton methods, optimization with PDE constraints, largescale optimization, image deblurring
65K10, 65M55, 65M32, 90C06
1 Introduction
The goal of this work is to construct optimal order multigrid preconditioners for optimal control problems of the type
(1) 
where with a bounded domain, is given, and is a linear continuous operator with being a compactly embedded subspace of . The parameter is used to adjust the size of the regularization term . Throughout this article denotes the norm or the operatornorm of a bounded linear operator in . The functions defining the inequality constraints in (1) satisfy for all . These problems arise in the optimal control of partial differential equations (PDEs), case in which represents the solution operator of a PDE. For example, the classical PDEconstrained optimization problem
(2) 
reduces to (1) when replacing in the cost functional of (2), where . A related problem, discussed in [8], addresses the question of timereversal for parabolic equations, a problem that is illposed. In this example we set , where is the time solution operator of a linear parabolic PDE with initial value , and is a fixed time. If the solution of the inverse problem needs to satisfy certain inequality constraints, e.g., when , perhaps representing the concentration of a substance, is required to have values in , then it is essential to impose these constraints explicitly in the formulation of the optimization problem, as shown in (1). For obvious reasons, in the PDEconstrained optimization literature (1) is referred to as the reduced problem. For other applications, such as image deblurring, can be an explicitly defined integral operator
with ; here is the original image and is the blurred image. Thus, by solving (1) we seek to reconstruct the image whose blurred version is a given , subject to additional box constraints.
We give a few references to works on multigrid methods for solving (1) with no inequality constraints. In this case (1) is equivalent to the Tikhonov regularization of the illposed problem , which in turn reduces to the linear system
(3) 
representing the regularized normal equations of . A significant literature [17, 20, 11, 16, 8, 2, 9], to mention just a few references, is devoted to multigrid methods for (3) or the unregularized illposed problem. Moreover, when is the solution operator of a linear PDE, an alternative strategy is to solve directly the indefinite systems representing the KarushKuhnTucker (KKT) optimality conditions instead of the reduced system, and many works [3, 18, 22, 23, 24] are concerned with multigrid methods for PDEconstrained optimization problems in unreduced form. A comprehensive discussion of the latter strategy is found in [4].
The presence of boundconstraints in (1) brings additional challenges to the solution process, since the KKT optimality conditions form a complementarity system as opposed to a linear or a smooth nonlinear system. As shown by Hintermüller et al. [13], the KKT system can be reformulated as a semismooth nonlinear system for which a superlinearly convergent Newton’s method can be devised – the semismooth Newton method (SSNM). Moreover, with controls discretized using piecewise constant finite elements, Hintermüller and Ulbrich [14] have shown that the SSNM converges in a meshindependent number of iterations for problems like (1), so it is a very efficient solution method in terms of number of optimization iterations. A comprehensive discussion of SSNMs can be found in [25]. However, as with Newton’s method, each SSNM iteration requires the solution of a linear system, and the efficiency of the SSNM depends on the availability of high quality preconditioners for the linear systems involved. Naturally, the question of devising preconditioners for SSNMs has received a lot of attention in recent years, especially in the context of optimal control problems constrained by PDEs, e.g, see [12, 1, 19], where preconditioners are primarily targeting the sparse and indefinite KKT systems arising in the solution process. For problems formulated as (1), the SSNM solution essentially requires inverting at each iteration a principal submatrix of the matrix representing a discrete version of , where denotes the mesh size. The multigrid preconditioner developed by Drăgănescu and Dupont in [8] for the operator arising in the unconstrained problem (3) is shown, under reasonable conditions, to be of optimal order with respect to the discretization: namely, if we denote by the multigrid preconditioner (thought of as an approximation of ), then
(4) 
where is the convergence order of the discretization and is the discrete control space; for continuous piecewise linear discretizations we have . A natural extension of the ideas in [8] led to the suboptimal multigrid preconditioner developed by Drăgănescu in [7] for principal submatrices of , where is shown to essentially be for a piecewise linear discretization. The key aspect of defining the multigrid preconditioners for principal submatrices of is the definition of the coarse spaces. The natural domain of a principal submatrix of , thought as an operator, is a subspace of . The multigrid preconditioner developed in [7] is based on constructing coarse spaces that are subspaces of , i.e., conforming coarse spaces. A similar strategy was used by Hoppe and Kornhuber [15] in devising multilevel methods for obstacle problems. However, a visual inspection of the eigenvectors corresponding to the extreme joint eigenvalues of and in [7] suggests that the multigrid preconditioner is suboptimal precisely because of the conformity of the coarse spaces. Thus in this work we defined a new multigrid preconditioner based on nonconforming coarse spaces. While the new construction is limited to piecewise constant approximations of the controls, we can show that, under reasonable conditions, the approximation order of the preconditioner in (4) is . It also turns out that the analysis of the new preconditioner is quite different from the analysis in [7]; fortunately it is also simpler.
This article is organized as follows: in Section 2 we give a formal description of the problem and we briefly describe the SSNM to justify the necessity of preconditioning principal submatrices of . Section 3 forms the core of the article; here we introduce and analyze the twogrid preconditioner, the main result being Theorem 3. In Section 4 we extend the twogrid results to multigrid; this section follows closely the analogue extension in [7] with certain modifications required by the nonconforming coarse spaces. In Section 5 we show numerical experiments conducted on two test problems: the elliptic constrained problem (2) and the boxconstrained image deblurring problem. We formulate a set of conclusions in Section 6. We included in Appendix A a convergence analysis of a Gaussian blurring operator that may also be of independent interest.
2 Problem formulation
To fix ideas we assume the compactly embedded space to be and to be polygonal or polyhedral. We denote by and we use the convention . We also define the norm by
where denotes the inner product. We focus on a discrete version of (1) obtained by discretizing the continuous operator using piecewise constant finite elements, as considered by Hintermüller and Ulbrich in [14]. Let be a family of shaperegular, nested triangulations of with with being the meshsize of the triangulation , and assume that
(5) 
for some independent of ; for example, if a uniform mesh refinement leads to . Furthermore, let be the space of continuous piecewise constant functions with respect to . Since the triangulations are nested, we have for all . We assume that a family of discretizations , is given, where is a finite element space with properties specified below, and represents a discrete version of . Note that it is not assumed that be a subspace of . The discrete optimization problem under scrutiny is
(6) 
where are discrete functions representing , and . As in [10], we assume that the operators satisfy the smoothed approximation condition (SAC) appended by  stability of :
Condition \thetheorem (Sac)
There exists a constant depending on and independent of so that

smoothing:
(7) 
smoothed approximation: for
(8) 
 stability: for
(9)
Remark \thetheorem
A simple consequence of Condition 2 is that
(10) 
To describe the semismooth Newton method for the discrete optimization problem, we rewrite (6) in vector form. Let and be the standard piecewise constant finite element basis in . First denote by the matrix representing the operator , and let be the mass matrix in , and the mass matrix in . Note that the mass matrices are diagonal. Then (6) is equivalent to
(11) 
where is the vector representing , and . To simplify the exposition we omit the sub and superscripts for the remainder of this section, so that , , , etc. Similarly to (1), the discrete optimization problem (11) has a unique solution, which satisfies the KKT system
(12) 
where is the adjoint of with respect to the inner product, , and are the Lagrange multipliers. The inequalities and the vectorvalued product are to be understood componentwise. The fact that we are able to write the KKT system in the form (12) is not completely obvious, and it relies on the mass matrices being diagonal. It is worth noting that in [10, 7], where the controls were discretized using continuous piecewise linear functions, the massmatrices were intentionally modified (equivalently to using a quadrature for computing inner products) so that they be diagonal. Following [13] (see also [7]), the complementarity problem (12) can be written as the nonsmooth nonlinear system
(13) 
where , . We leave it as an exercise to verify that (12) is equivalent to (13). Given the solution of (12), the following sets play a role in understanding the equivalence of (12) and (13):
Assume (13) holds. If , the second equality in (13) implies that ; hence , so the constraint corresponding to the component of is inactive. Instead, if , then , so the lower constraints are active; similarly, if , then , so the upper constraints are active. So, if is the solution of (13), then is the set of indices where the constraints are inactive, is the set of indices where the lower constraints are active, while is the set of indices where the upper constraints are active.
The system (13) is in fact semismooth due to the fact that the function (from to ) is slantly differentiable [13]. Consequently, (13) can be solved efficiently using the SSNM, which is equivalent to the primaldual active set method described below, as shown in [13]. The equivalence of the two is used to prove that the convergence is superlinear. The SSNM is an iterative process that attempts to identify the sets where the inequality constraints are active/inactive. More precisely, at the iteration, given sets partitioning , we solve the system
(14) 
The solution is then used to define the new sets
The key problem in (14) is to identify for . If we denote the Hessian of from (11) by
and partition the matrix according to the sets and
(we used MATLAB notation to describe submatrices) and the vectors and as
then is given explicitly in (14), , and satisfies
(15) 
The remaining components of are given by
Therefore, the main challenge in solving (14) (which is a linear system) is in fact solving (15). The goal of this work is to construct and analyze multigrid preconditioners for the matrices appearing in (15).
3 The twogrid preconditioner
As in [10, 7], we start by designing a twogrid preconditioner for the principal submatrices of the Hessian arising in the SSNM solution process of (11), then we follow the idea in [8] to extend in Section 4 the twogrid preconditioner to a multigrid preconditioner of similar asymptotic quality. In this section we assume that we are solving (11) at a fixed level , and that we reached a certain iterate in the SSNM process, with a current guess at the inactive set given by . Since we are not changing the SSNM iteration we discard the sub and superscripts , and we refer to as the “current inactive set”.
For constructing the preconditioner it is preferable to regard the matrix as a discretization of the operator
representing the Hessian of in (6). We first define the (current) inactive space
Furthermore, denote by the projection onto and by
the inactive domain. The matrix represents the operator
(16) 
called here the inactive Hessian, where is the extension operator. Thus, our goal is to construct a twogrid preconditioner for .
The first step, and perhaps the most notable achievement in this work, is the construction of an appropriate coarse space: we define the coarse inactive space as the span of all coarse basis functions whose support intersect nontrivially, i.e.,
with
(17) 
where is the Lebesgue measure in . Similarly, we define the coarse inactive domain by
(18) 
A few remarks are in order. First, since is a refinement of , it follows that the set is a (possibly empty) union of elements. Since each element making up lies inside one element making up , we have the inclusion
(19) 
Second, we do not expect in general that . However, we have
(20) 
We also denote
a set we call the numerical boundary of with respect to the coarse mesh. In Figures 1 and 2 we show the sets (darkgray) and (lightgray) for a few cases on uniform triangular grids on .
We would like to contrast the definition (17) of the coarse indices with that of Drăgănescu [7], where a coarse basis function enters the span of the coarse inactive space if ; this would define a coarse inactive space that lies inside the fine inactive space , and the inclusion (19) would be reversed, that is, the coarse inactive domain would be included in the fine inactive domain.
We now define the twogrid preconditioner by
(21) 
The definition (21) is rooted in the twogrid preconditioner definition from [8, 7]; the difference lies the presence of the action of the projection as the last step in (21) (leftmost term), which is necessary precisely because is not expected to be a subspace of . An operator related to , necessary for the analysis, is
(22) 
Remark \thetheorem
Both and are symmetric with respect to the inner product, that is,
In addition, if , then .
The key to the last assertions in Remark 3 is that has no effect (hence can be discarded) when . Our ultimate goal is to estimate the spectral distance between and , as a measure of their spectral equivalence (see definition below). As an intermediate step we will estimate the spectral distance between and .
Given a Hilbert space , we denote by the set of symmetric positive definite operators in . The spectral distance between , introduced in [8] to analyze multigrid preconditioners for inverse problems like (3), is given by
If is the smallest number for which the following inequalities hold
and , then . The spectral distance not only allows to write the above inequalities in a more compact form, but some of its properties (including the fact that it is a distance function) are also used in the analysis. The main result in this article is the following theorem.
Assuming Condition holds, there exists constants and independent of and the inactive set so that if the following holds:
(23) 
We postpone the proof of Theorem 3 after a few preliminary results.
Remark \thetheorem
Without further formalizing the argument, we would like to comment on the optimality of the result in Theorem 3. First, it should be recognized that can be for certain choices of . For example, if is a uniform refinement of in two dimensions, and contains exactly one level subdivision of each of the level triangles that make up , as shown in Figure 1 b, then the entire coarse space and ; thus . In this case the twogrid preconditioner is not efficient. However, if is a good approximation of the correct inactive domain , and is sufficiently regular, e.g., has a Lipschitz boundary, then we expect . It is in this sense that we regard Theorem 3 as proof of the fact that the twogrid preconditioner approximates the operator with optimal order. Figures 1 b and 2 a and b show a progression of in gray for the case when is a union of two discrete representations of disks on grids with . The ratio of the gray areas in Figure 2 a and b representing is . Furthermore, this ratio converges to as the resolution tends to zero.
The optimality result in the following lemma is a critical component for the proof of Theorem 3. {lemma} There exists a constant independent of the meshsize and the inactive set so that
(24) 
where is extended with zero outside its support. {proof} For we have
Since ,
Now
where is the constant (uniform with respect to and due to shaperegularity) appearing in the BrambleHilbert Lemma on each element in with ; we also used the fact that the projection is local for the finite element space under consideration, that is, is the average of on the element . It follows that
which implies the desired result with .
Remark \thetheorem
It is remarkable that the constant is independent of the inactive set, and depends only on the constant appearing in the BrambleHilbert lemma and the refinement ratio. Also, what makes the optimal estimate (24) possible, is the inclusion . If our choice of spaces had led to , then the term to estimate would be , which is expected to be of size ; as shown in [7], the latter term is often of size .
Under the assumptions of Theorem there exists independent of and the fine inactive set so that, if , then
(25) 
As in Lemma 3, functions are extended with zero outside their support when necessary. We have for