Preconditioning PDEconstrained optimization with sparsity and control constraints
Abstract
PDEconstrained optimization aims at finding optimal setups for partial differential equations so that relevant quantities are minimized. Including sparsity promoting terms in the formulation of such problems results in more practically relevant computed controls but adds more challenges to the numerical solution of these problems. The needed terms as well as additional inclusion of box control constraints require the use of semismooth Newton methods. We propose robust preconditioners for different formulations of the Newton’s equation. With the inclusion of a linesearch strategy and an inexact approach for the solution of the linear systems, the resulting semismooth Newton’s method is feasible for practical problems. Our results are underpinned by a theoretical analysis of the preconditioned matrix. Numerical experiments illustrate the robustness of the proposed scheme.
65F10, 65N22, 65K05, 65F50
DEconstrained optimization, Saddle point systems, Preconditioning, Krylov subspace solver, Sparsity, Semismooth Newton’s method
1 Introduction
Optimization is a crucial tool across the sciences, engineering, and life sciences and is thus requiring robust mathematical tools in terms of software and algorithms [23]. Additionally, over the last decade the need for sparse solutions has become apparent and the field of compressed sensing [8, 7] is a success story where mathematical tools have conquered all fields from image processing [22] to neuroscience [11].
One of the typical areas where optimization and sparsity promoting formulations are key ingredients is the optimization of functions subject to partial differential equation (PDE) constraints. In this field one seeks an optimal control such that the state of the system satisfies certain criteria, e.g. being close to a desired or observed quantity, while the control and state are connected via the underlying physics of the problem. This socalled state equation is typically a PDE of a certain type. While this is not a new problem [19, 37] the penalization of the control cost via an norm requires different methodology from the classical control term. Nonsmooth Newton methods [19, 17, 18, 38] have become the stateoftheart when dealing with nonsmooth terms in PDEconstrained optimization problems as they still exhibit local superlinear convergence. In the case of a sparsity promoting term within PDEconstrained optimization Stadler [34] considered the applicability and convergence of the nonsmooth Newton’s method. Herzog and coauthors have since followed up on this with more sophisticated sparsity structures [16, 14] such as directional and annular sparsity.
The core of a Newton’s scheme for a PDE constrained problem is to find a new search direction by solving a largescale linear system in structured form, the socalled Newton’s equation. For the case of problems with a sparsity promoting term, the solution of this linear algebra phase has not yet received very much attention. In [15] Herzog and Sachs consider preconditioning for optimal control problems with box control constraints and point out that including sparsity constraints yields similar structure in the linear systems.
The aim of this work is threefold. We discuss a reduced formulation of the Newton’s equation whose size does not depend on the active components, that is components which are zero or are on the boundary of the box constraints. We theoretically and experimentally analyze two classes of preconditioners with particular emphasis on their robustness with respect to the discretized problem parameters. Finally we show how linesearch and inexactness in the nonlinear iterations can be exploited to make the Newton’s method reliable and efficient.
The remainder of the paper is structured as follows. We state the main model problem based on the Poisson problem in Section 2 and discuss its properties. Section 3 introduces the semismooth Newton’s method that allows the efficient solution of the nonsmooth optimization problem. This scheme is based on the use of active sets representing the sparsity terms and the box control constraints and are reflected in the matrix representation of the generalized Jacobian matrix. We also introduce the convectiondiffusion problem as a potential state equation; this results in a nonsymmetric matrix representing the discretized PDE operator. We then discuss two formulations of the saddle point system associated with the Newton step. One of them is a fullsized system of structure while the other one is in reduced form. In Section 4 the construction of efficient preconditioners is discussed and provide a thorough analysis of the preconditioners proposed for both the full and the reduced system. The bounds illustrate that the Schurcomplement approximations are robust with respect to varying essential system parameters. Numerical experiments given in Section 5 illustrate the performance of the proposed iterative schemes.
Notation
The matrix is a diagonal (0,1) matrix with nonzero entries in the set of indices and is a rectangular matrix consisting of those rows of that belong to the indices in (. Finally, given a sequence of vectors , for any function , we let .
2 Model problems
The typical model problem in PDEconstrained optimization is usually stated as
where is the state and the control. The term is the socalled desired state. The state and are then linked via a state equation such as the Poisson equation. The computed controls for this problem typically are ‘potato shaped’ and distributed in space and time. This is an unwanted feature in many applications and one commonly considers the model where we seek such that the function
(1) 
is minimized subject to the constraints
(2)  
(3) 
with additional boxconstraints on the control
(4) 
with , with a.e. and . The Laplace operator in (2) could be replaced by other elliptic operators. For the case this PDEconstrained optimization problem has been studied in great detail (see [37, 19] and the reference mentioned therein). For these problems one typically writes down the first order conditions, which are then discretized, and the associated equation solved. As an alternative, one discretizes the optimization problem first and then obtain the first order conditions; the associated nonlinear system are then solved using many wellstudied algorithms.
The case is much harder as the addition of the nonsmooth sparsity term makes the solution of the PDEconstrained optimization problem introduced above very different to the more standard smooth optimization problems. The use of semismooth Newton schemes [4, 17, 38] has proven to give an efficient algorithm as well as to allow for a thorough analysis of the convergence behaviour. In this paper we will not discuss the conditions needed to guarantee the differentiability of the involved operators but mostly refer to the corresponding literature. Here we state the following result identifying the optimality conditions of problem (1)(4).
Theorem 2.1
We now consider the operator representing the nonlinear function in (5), defined by
(6)  
Alternatively, one could use the gradient equation to eliminate the Lagrange multiplier via and further use obtaining the complementarity equation in the variables
3 The semismooth Newton’s method for the optimality system
Let denote the dimension of the discretized space. Let the matrix represent a discretization of the Laplacian operator (the stiffness matrix) or, more generally, be the discretization of a nonselfadjoint elliptic differential operator. Let the matrix be the FEM Gram matrix, i.e., the socalled mass matrix and let the matrix represent the discretization of the control term within the PDEconstraint. While for the Poisson control problem this is simply the mass matrix, i.e. , for boundary control problems or different PDE constraints (see Section 3.1) the matrix does not necessarily coincide with . Finally, let be the discrete counterparts of the functions , respectively.
The optimality system (5) can be represented in the discretized space using the nonlinear function defined as
(7) 
where and the discretized complementarity function is componentwise defined by
for .
Due to the presence of the min/max functions in the complementarity function , function is semismooth [18, 38]. The natural extension of the classical Newton’s method is
(8) 
where is the generalized Jacobian of at [38]. Under suitable standard assumptions, the generalized Jacobian based Newton’s method (8) converges superlinearly [38]. We employ an extension of the above method proposed in [21] and reported in Algorithm 1 that takes into account both the inexact solution of the Newton’s equation and the use of a globalization strategy based on the merit function
At Lines 34 the Newton’s equation is solved with an accuracy controlled by the forcing term ; Lines 68 consist in a linesearch where the sufficient decrease is measured with respect to the (nonsmooth) merit function .
Under certain conditions on the sequence in (10), the method retains the superlinear local convergence of (8) and gains the convergence to a solution of the nonlinear system starting from any [21].
(9) 
(10) 
The form of can be easily derived using an activeset approach as follows. Let us define the following sets
(11)  
(12)  
(13) 
Note that the above five sets are disjoint and if
is the set of active constraints then its complementary set of inactive constraints is
With these definitions at hand, the complementarity function can be expressed in compact form as
(14) 
It follows from the complementarity conditions that

for ;

for and for ;

for and for .
From (14), the (generalized) derivative of follows, that is
together with the corresponding Jacobian matrix of ,
We note that depends on the variable through the definition of the sets and .
3.1 Other PDE constraints
It is clear that the above discussion is not limited to the Poisson problem in (2) but can also be extended to different models. We consider the convectiondiffusion equation
(15)  
(16)  
(17) 
as constraint to the objective function (1). The parameter is crucial to the convectiondiffusion equation as a decrease in its value makes the equation more convection dominated. The wind is predefined. Such optimization problems have been recently analyzed in [29, 13, 26, 1] and we refer to these references for the possible pitfalls regarding the discretization. We focus on a discretizethenoptimize scheme using a streamline upwind PetrovGalerkin (SUPG) approach introduced in [6]. Note that other schemes such as discontinuous Galerkin methods [36] or local projection stabilization [26] may be more suitable discretizations for the optimal control setup as they often provide the commutation between optimize first or discretize first for the first order conditions. Nevertheless, our approach will also work for these discretizations. We employ linear finite elements with an SUPG stabilization that accounts for the convective term. The discretization of the PDEconstraint is now different as we obtain
(18) 
with the SUPG discretization of the differential operator in (15) (see [10, 26] for more details). The matrix now includes an extra term that takes into account the SUPG correction. Entrywise this matrix is given as
where are the finite element test functions and is a parameter coming from the use of SUPG [6, 10]. The resulting matrices are both unsymmetric and hence forward problems require the use of nonsymmetric iterative solvers. While the optimality system still remains symmetric the nonsymmetric operators have to be approximated as part of the Schurcomplement approximation and require more attention than the simpler Laplacian.
3.2 Solving the Newton’s equation “exactly”
Given the current iterate and the current active and inactive sets and , a step of the semismooth Newton’s method applied to the system with in (7) has the form
(19) 
We note that the system (19) is nonsymmetric, which would require the use of nonsymmetric iterative solvers. The problem can be symmetrized, so that better understood and cheaper symmetric iterative methods than nonsymmetric ones may be used for its solution. More precisely, the system in (19) can be symmetrized by eliminating from the last row and obtaining
(20) 
with
(21) 
while
The dimension of the system (20) is and therefore depends on the size of the active set. Since we expect to be large if the optimal control is very sparse, that is when is large, we now derive a symmetric reduced system whose dimension is independent of the activeset strategy and that shares the same properties of the system above.
First, we reduce the variable
(22) 
yielding
that is still a symmetric saddlepoint system. Then, since is nonsingular, we can reduce further and get
(23) 
together with
(24) 
and in (22). We note that the dimension of system (23) is now and that the computation of and only involves the inversion of the diagonal matrix . Moreover, the (2,2) matrix term does not need not be formed explicitly.
3.3 Solving the Newton’s equation “inexactly”
We derive suitable inexact conditions on the residual norm of the systems (20) and (23) in order to recover the local convergence properties of Algorithm 1 and, at the same time, exploit the symmetry of the linear systems.
Let us partition the residual of the linear system (19) as and assume that . This simplification allows the substitution (21).
Let . Then, the steps (9) and (10) of Algorithm 1 correspond to solve the augmented system (20) as follows
(25) 
Moreover, let be the residual in the reduced system (23). We next show that , so that we can solve the reduced Newton’s equation inexactly by imposing the variable accuracy explicitly on the reduced residual, instead of imposing it on the original residual. More precisely, we find with residual such that
(26) 
After that, we can recover and from (22) and (24), respectively.
The norm equality result is very general, as it holds for any reduced system. We thus introduce a more general notation. Let us consider the block linear system
(27) 
with nonsingular. The Schur complement system associated with the first block is given by , with . The following result holds.
Let be the residual obtained by approximately solving the reduced (Schur complement) system , with . Then the residual satisfies . {proof} We write
Solving by reduction of the second block corresponds to using the factorization above as follows. Setting we have
Hence, and . Let be an approximation to , so that for some residual vector . Let be defined consequently, so that . Substituting, we obtain
from which the result follows.
This result can be applied to our 22 reduced system after row and column permutation of the original 44 system, so that the variables used in the reduced system appear first.
4 Iterative solution and preconditioning
We now discuss the solution of the linear systems (20) and (23) presented earlier. For the ease of the presentation, we omit in this section the subscript .
While direct solvers impress with performance for twodimensional problems and moderate sized threedimensional one they often run out of steam when dealing with more structured or general threedimensional problems. In this case one resorts to iterative solvers, typically Krylov subspace solvers that approximate the solution in a Krylov subspace
where is the initial residual. The matrix is the preconditioner that approximates in some sense, and it is cheap to apply [30, 10, 3]. In the context of PDE problems is often derived from the representation of the inner products of the underlying function spaces [20, 12, 32]. The construction of appropriate preconditioners follows the strategy to approximate the leading block of the saddle point system and correspondingly the Schur complement of this matrix. As iterative solvers is concerned, the indefiniteness of the symmetric system calls for the use of a shortterm algorithm such as minres [24]. To maintain its properties, the accompanying preconditioner should be symmetric and positive definite. In case when the system matrix is nonsymmetric or the preconditioner is indefinite, many competing methods that are either based on the Arnoldi process or the nonsymmetric Lanczos method can be applied [30]. As a rule of thumb, once a good preconditioner is constructed the choice of the nonsymmetric solver becomes less important. We now discuss the construction of preconditioners in more detail.
We consider preconditioning techniques based on the activeset Schur complement approximations proposed in [28] and tailor this idea to the combination of sparsity terms and box control constraint case. Our strategy here uses a matching technique [25] that allows the parameterrobust approximation of the Schurcomplement while still being practically useful. This technique was already used for stateconstraints [35] not including sparsity terms and has previously shown to be successful for active set methods [28].
Taking the submatrix as the block, the activeset Schur complement of the matrix in (20) in its factorized form is defined as
with
(28) 
The matrix , and thus , explicitly depends on the current Newton’s iteration through the change in the active and inactive sets.
We consider the following factorized approximation of :
(29) 
which extends the approximation proposed in [28] to the case . The approximation yields the factorized activeset Schur complement approximation
(30) 
The following results are along the lines of the analysis conducted in [28] considering and nonsymmetric. As in [28], we assume that is diagonal so that commutes with both and ; for alternatives we refer the reader to the discussion in [28, Remark 4.1]. {proposition} Let and be as defined above. Then
Proof. The result follows from
If , that is all indices are active, then the two matrices coincide. In general,
and the righthand side matrix is of low rank, with a rank that is at most twice the number of inactive indices. In other words, has at least a number of unit eigenvalues corresponding to half the number of active indeces.
In the unconstrained case ( and no bound constraints) and for , the approximation in (29) corresponds to the approximation proposed in [25, 26].
We now derive general estimates for the inclusion interval for the eigenvalues of the pencil , whose extremes depend on the spectral properties of the nonsymmetric matrices and and of . {proposition} Let be an eigenvalue of . Then it holds
with
Moreover, if , then for , is bounded by a constant independent of . {proof} Let and . Then we have
and
We first provide the lower bound . Let . For we can write
where and . Then if and only if
which holds since .
We now prove the upper bound. From we obtain for
(31) 
Recalling the definition of we have
Therefore, from (31) it follows
(32)  
Recalling that
we have
To analyze the behavior for , let us suppose that . Without loss of generality assume that . Let , so that . Thanks to the hypothesis on , the matrix is also positive definite, that is its eigenvalues all have positive real part. Let be the eigendecomposition ^{1}^{1}1In the unlikely case of a Jordan decomposition, the proof proceeds with the maximum over norms of Jordan blocks inverses, which leads to the same final result. of . Then
We thus have
so that with for . \endproof
In practice, the matrix is replaced by an approximation whose inverse is cheaper to apply. This is commonly performed by using a spectrally equivalent matrix , so that there exist two positive constants independent of the mesh parameter such that . For this choice, we thus obtain the following spectral bounds for ,
Based on and defined in (29) and (30), respectively, we introduce convenient preconditioners for the linear systems (20) and (23). For the augmented linear system in (20) we consider a block diagonal preconditioner and indefinite preconditioner of the form:
(33) 
and
(34) 
where
For the reduced system in (20) we consider
(35) 
and
(36) 
In the following we analyze the spectral properties of the preconditioned coefficient matrices when the above preconditioners are applied.