1 Introduction

We consider shape optimization problems subject to elliptic partial differential equations. In the context of the finite element method, the geometry to be optimized is represented by the computational mesh, and the optimization proceeds by repeatedly updating the mesh node positions. It is well known that such a procedure eventually may lead to a deterioration of mesh quality, or even an invalidation of the mesh, when interior nodes penetrate neighboring cells. We examine this phenomenon, which can be traced back to the ineptness of the discretized objective when considered over the space of mesh node positions. As a remedy, we propose a restriction in the admissible mesh deformations, inspired by the Hadamard structure theorem. First and second order methods are considered in this setting. Numerical results show that mesh degeneracy can be overcome, avoiding the need for remeshing or other strategies. \fenics code for the proposed methods is available on GitHub.


lowtildeurl \addbibresourceWorld.bib

hape optimization, shape gradient descent, shape Newton method, restricted mesh deformations


90C30, 90C46, 65K05

1 Introduction

Shape optimization is ubiquitous in the design of structures of all kinds, going from drug eluting stents [Zunino2004] until aircraft wings [SchmidtSchulzIlicGauger2011] or horn-like structures appearing in devices for acoustic or electromagnetic waves [UdawalpolaBerggren2008]. All of these and many other applications involve the solution of a partial differential equation (PDE), so the general formulation of shape optimization problems considered here is as follows:


Here is the solution of the underlying PDE defined on the domain , which is to be optimized.

Computational approaches to solving PDE-constrained shape optimization problems usually proceed along the following lines. First, one derives an expression for the Eulerian derivative of the objective w.r.t. vector fields which describe the perturbation of the current domain . The perturbations are carried out either in terms of the perturbation of identity, or the velocity method. The Eulerian derivative can be stated either as an expression concentrated on the boundary , or as a volume expression. The first is due to the Hadamard structure theorem ([SokolowskiZolesio1992, Theorem 2.27]). For volume expressions, we refer the reader, for instance, to [LaurainSturm2015:1, HiptmairPaganiniSargheini2015:1]. Second, the Eulerian derivative, which represents a linear functional on the perturbation vector fields, needs to be converted into a vector field itself, often referred to as the shape gradient. This can be achieved by evaluating the Riesz representative of the derivative w.r.t. an inner product. The latter is often chosen as the bilinear form associated with the Laplace-Beltrami operator on , or with the linear elasticity (Lamé) system on , see e.g. [SchmidtSchulzIlicGauger2011, SchulzSiebenborn2016, SchmidtSchulz2010:1, SchmidtSchulz2009:2]. More sophisticated techniques include quasi-Newton or Hessian-based inner products; see [EpplerHarbrecht2005, NovruziRoche2000, SchulzSiebenbornWelker2015:1, Schulz2014]. This perturbation field is then used to update the domain inside a line search method, where the transformed domain


associated with the step size is obtained from the perturbation of identity approach.

While the computation of the Eulerian derivative is either based on the continuous or some discrete formulation of problem (1), the computation of the shape gradient and the subsequent updating steps will always be carried out in the discrete setting. Typically, the shape is represented by a computational mesh, and the underlying PDE is solved, e.g., by the finite element method. The perturbation field is then expressed as a piecewise linear field, i.e., it is represented in terms of a velocity vector attached to each vertex position. The domain is subsequently updated according to (2) inside a line search procedure.

It has been observed in many publications that this straightforward approach has one major drawback: it often leads to a degeneracy of the computational mesh. This degeneracy manifests itself in different ways, but mostly through degrading cell aspect ratios, or even mesh nodes entering neighboring cells. Indeed, [DoganMorinNochettoVerani2007] for instance observe that

It is typical of surface evolution undergoing large deformations that triangles may tangle and cross, and that their angles may become large. These mesh distortions limit resolution and approximability, as well as impair computations, thereby leading to numerical artifacts.

In practice, both phenomena often lead to a breakdown of computational shape optimization procedures.

Over the past 10 years, a range of various techniques have been proposed to circumvent this major obstacle in computational shape optimization. A natural choice is to remesh the computational domain; see for instance [WilkeKokGroenwold2005, MorinNochettoPaulettiVerani2012, Sturm2016, DokkenFunkeJohanssonSchmidt2018_preprint, FepponAllaireBordeuCortialDapogny2018_preprint]. In fact, [MorinNochettoPaulettiVerani2012] point out that

A rule of thumb for dealing with complicated domain deformations is that remeshing is indispensable and unavoidable.

Remeshing can be carried out either in every iteration or whenever some measure of mesh quality falls below a certain threshold. Drawbacks of remeshing include the high computational cost and the discontinuity introduced into the history of the objective values.

[BaenschMorinNochetto2005, DoganMorinNochettoVerani2007] describe several techniques such as mesh regularization, space adaptivity, angle control in addition to a semi-implicit Euler discretization for the velocity method, with time adaptivity and backtracking line search. In a follow-up work, [MorinNochettoPaulettiVerani2012] consider a line-search method that aims to avoid mesh distortion due to tangential movements of the boundary nodes, combined with a geometrically consistent mesh modification (GCMM) proposed in [BonitoNochettoPauletti2010]. [GiacominiPantzTrabelsi2017] address the issue of spurious descent directions, attributed to discretization errors in the underlying PDE model, via a goal-oriented mesh adaptation approach. Recently, [IglesiasSturmWechsung2017_preprint] proposed to enforce shape gradients from nearly conformal transformations, which are known to preserve angles and ensure a good quality of the mesh along the optimization process.

Finally, we mention [SchulzSiebenbornWelker2015:1, SchulzSiebenbornWelker2015:2, SchulzSiebenborn2016], who advocate the linear elasticity model as the inner product to convert Eulerian derivative into a shape gradient. In particular in [SchulzSiebenbornWelker2015:2] the authors propose to omit the assembly of interior contributions appearing in the discrete volume expression of the Eulerian derivative. This approach is vaguely related to but conceptionally different from our idea and no analysis is provided there.

Our Contribution

In this paper we propose an approach to avoid spurious descent directions in the course of numerical shape optimization procedures, which is different from all of the above and does not require remeshing. The main idea is based on the observation that—in the continuous setting—shape gradients are perturbation fields which are generated exclusively by normal forces on the boundary of the current domain. This follows from the Hadamard structure theorem. However, in the discrete setting, the Hadamard structure theorem is not available, and thus classical discrete shape gradients also contain contributions from interior forces and tangential boundary forces. We therefore propose to project the shape gradient onto the subspace of perturbation fields generated by normal forces. We refer to this approach as restricted mesh deformations.

We demonstrate that the proposed approach indeed avoids spurious descent directions and degenerate meshes. As a consequence, we can solve discrete shape optimization problems to high accuracy, i.e., very small norm of the restricted gradient. Both gradient and Newton schemes in 2D and 3D are considered. An implementation in the finite element software \fenics is available as open-source on GitHub; see [EtlingHerzogLoayzaWachsmuth2018:2].

The paper is structured as follows. In Section 2 we present a shape optimization model problem and prove, as an auxiliary result, the existence of a globally optimal domain. In Section 3 we review the volume and boundary representations of the Eulerian derivative. In Section 4 we consider the discrete counterpart of the model problem and its Eulerian derivative. We also illustrate the detrimental effect of spurious descent directions. The main idea of restricted mesh deformations is introduced in Section 5. An associated restricted gradient scheme is also introduced and its performance is compared to the classical shape gradient method in Section 6. Sections 8 and 7 are devoted to second-order Eulerian derivatives in the restricted setting and the demonstration of the associated Newton scheme. Finally we present in Section 9 a new result on the convergence of a sequence of stationary domains for discrete shape optimization problems to a stationary point of the continuous problem as the discretization mesh size goes to zero. Conclusions are given in Section 10.

2 Preliminaries

Throughout the paper, we consider the following model problem,


Here the optimization variable is an admissible domain contained in some bounded and open hold-all , and is a given right hand side. The elliptic state equation is understood in weak form,


The next result shows that our shape optimization problem (3) has a solution if we slightly relax the class of admissible sets. We will see that it is sufficient to consider quasi-open rather than open sets. For an introduction of quasi-open sets, quasi-continuity and related notions, we refer the reader to [AttouchButtazzoMichaille2014, Section 5.8]. We consider the slightly relaxed problem


Let us recall that and is the dual space of . The PDE in (5) is also to be understood in the weak sense, i.e.,

We emphasize that the main reason for this existence result is that the objective is monotone w.r.t. the state , see also Remark 2 below.

Theorem 1

Problem (5) admits a global minimizer .

Note that the extreme case is possible.


First, we remark that it is sufficient to consider only pairs with in (5). Indeed, if is any admissible pair, we can consider in its stead. Note that is quasi-open since can chosen to be quasi-continuous. This pair is again admissible due to

since q.e. on . Moreover, the objective value of is not larger than the objective value of .

Now, let be a minimizing sequence for (5) with and . It is clear that the sequence is bounded in , therefore we can extract a weakly convergent subsequence (without relabeling) with weak limit . Clearly, . Now we define and denote by the solution of in . It remains to check that holds since this implies the global optimality of (due to the monotonicity of the objective). To this end, we choose an arbitrary such that . For we have due to . Moreover, in , see [Wachsmuth2014:2, Lemma 4.1]. Thus,

Since , we can test the equation for with and we find

Now, we can use a density argument, see [Mignot1976, Lemme 3.4], to obtain that this inequality holds for all which satisfy . Using implies , i.e., . Finally, the optimality of follows from

Remark 1

There is a deeper reason for being true in the above proof. Indeed, using the theory of relaxed Dirichlet problems, one can show that satisfies for some capacitary measure . We refer to [AttouchButtazzoMichaille2014, Section 5.8.4] for a nice introduction to capacitary measures. Due to we have (in a certain sense) and therefore follows from the maximum principle since “”. However, we included the above direct proof because it does not rely on the notion of capacitary measures.

Remark 2

The above proof of existence generalizes to a larger class of objective functionals. In fact, we can replace the objective in (5) with

if the integrand satisfies \cref@addtoresetequationparentequation


Under these general assumptions, one can use the same proof as the one given for Theorem 1 above, but the final estimate has to be replaced by

Note that Fatou’s lemma together with a.e. (along a subsequence) implies

Again, this shows the optimality of .

3 Shape Calculus

This section is devoted to the presentation of the shape differentiability of problem (3). Since this is rather standard problem we will be able to directly apply results from [ItoKunischPeichl2008:1]. To this end, we assume that both the hold-all and are open and have -boundaries and , respectively. Moreover we assume so that has a positive distance to the boundary of .

We are describing variations of the domain by the perturbation of identity method, i.e., we consider a family of transformations such that


where is a given vector field. The family creates a family of perturbed domains . In view of Banach’s fixed point theorem, there exists a bound such that is invertible for all .

By a straightforward application of [ItoKunischPeichl2008:1, Theorem 2.1] we obtain the following result.

Theorem 2

The shape functional given in (3) is shape differentiable and its Eulerian derivative in the direction of the perturbation field is given by


where denotes the Jacobian of and the adjoint state is the unique solution of the following adjoint problem,


Notice that (8) is the so-called volume or weak formulation of the Eulerian derivative of (3). Besides the volume formulation, there exists an alternative representation of (8) by virtue of the well known Hadamard structure theorem; see [DelfourZolesio2011, Chapter 9, Theorem 3.6]. We state it here in a particularized version for problem (3). From now on, denotes the outer unit normal vector along the boundary of .

Corollary 1 (Hadamard structure theorem for (3))

The Eulerian derivative (8) of problem (3) has the representation


Notice that under the assumption that has a -boundary, and belong to and thus their normal derivatives are in , which embeds into when ; see for instance [AdamsFournier2003, Theorem 4.12]. Consequently, belongs to in this case.

Formula (10) is known as the boundary or strong representation of (8), and it can be obtained from (8) by the divergence theorem; compare [Sturm2015:1], [SokolowskiZolesio1992, Chapter 3.3], [HaslingerMaekinen2003, Example 3.3]. We also refer the reader to [HiptmairPaganiniSargheini2015:1], where the volume and boundary formulations are compared w.r.t. their order of convergence in a finite element setting.

4 Investigation of the Discrete Objective

In order to solve the shape optimization problem (3) numerically, some kind of discretization has to be applied. The most common choice in the literature consists in a discretization of the PDE by some finite element space defined over a computational mesh, which we denote by and whose nodal positions serve to represent the discrete unknown domain.

A common choice is to replace by the finite element space of piecewise linear, globally continuous functions,


defined over an approximation of consisting of geometrically conforming simplicial cells, i.e., triangles and tetrahedra in or space dimensions, respectively. Consequently, the state equation (4) is replaced by


This leads to the following discrete version of (3) frequently encountered in the literature,

Minimize (13)

We refer the reader to [GanglLangerLaurainMeftahiSturm2015:1_preprint, Sturm2016, SchulzSiebenbornWelker2015:2, SchulzSiebenborn2016] for examples of this procedure.

Let us denote by the reduced objective value in (13), i.e., , where is the unique solution of (12). In order to derive a discrete variant of the volume formulation (8) of the Eulerian derivative, we introduce the discrete adjoint equation,


The following theorem shows that a straightforward replacement of the state and adjoint state by their finite element equivalents and in (8) yields the correct formula for the Eulerian derivative of the discrete objective , provided that the perturbation field is piecewise linear, i.e., belongs to

Theorem 3

Suppose that and are the unique weak solutions of the discrete state equation (12), and the discrete adjoint equation (14), respectively. Moreover, let . Then


The proof of this theorem follows along the lines of the continuous case, see, e.g., [HiptmairPaganiniSargheini2015:1, LaurainSturm2015:1]. A detailed derivation can be found in [DelfourPayreZolesio1985, Section 4].

Remark 3
  1. Theorem 3 can be viewed as the statement that discretization and optimization (in the sense of forming the Eulerian derivative) commute for problem (3).

  2. The finite element analogue of the boundary expression (10) is not an exact representation of the discrete Eulerian derivative. This is since the integration by parts necessary to pass from the volume to the boundary expression has to be done element by element and it leaves inter-element contributions; see also the discussion in [Berggren2010].

  3. Theorem 3 remains true when higher order Lagrangian finite elements on simplices are used in place of . However it is essential that remains piecewise linear so piecewise polynomials are transformed into piecewise polynomials of the same order.

  4. Alternative expressions for (16) can be obtained following the so-called discrete adjoint approach, in which the derivative of w.r.t. the nodal positions of is addressed by differentiating the finite element matrices. We refer to [SchneiderJimack2008, Berggren2010, RothUlbrich2013] for examples of this procedure.

Despite the simplicity to obtain the Eulerian derivative of the discrete problem, we would like to emphasize here that the discrete problem (13) itself has the following serious drawback. The search space obtained from utilizing the nodal positions of the mesh as optimization variables includes meshes with very degenerate cells. Those lead to poor approximations of solutions of the state equation, which may give rise, however, to smaller values of the discrete objective. Therefore, any optimization algorithm for the solution of (13) sooner or later is likely to encounter spurious descent directions which typically have support in only a few mesh nodes and which lead to degenerate meshes.

Example 1

Let us illustrate this behavior by means of problem (3) with data . The optimal domain is unknown. We begin with the computational mesh shown in Fig. 2 (left). Consider for example the piecewise linear vector field represented by its nodal values

where the boundary node can be easily identified from Fig. 2.

We found that is not only a descent direction for the objective at but in fact that the line search function

decreases until the triangle formed by and its two interior neighbors degenerates to a line, which happens at ; see Fig. 1. At this point, finite element computations break down.

In computational experience spurious descent directions do not usually occur during the early iterates. Thus they can be, and often are, avoided by early stopping, at the expense of a reduced tolerance. Alternatively, mesh quality control and remeshing can help to avoid mesh destruction, but this introduces discontinuities in the objective function’s history.

In any case, the existence of spurious descent directions is a structural disadvantage of problem (13). Therefore we propose in the following section a new computational approach. Our approach does not seek to solve (13) literally but in a certain relaxed sense, which is inspired by the Hadamard structure theorem and which avoids spurious descent directions.

Figure 1: for Example 1. The step sizes , , belonging to the domains in Fig. 2 are highlighted.
Figure 2: Evolution of the mesh under with perturbation field given in Example 1 at , , (from left to right). The solutions of the state equation (12) are also shown.

5 Restricted Mesh Deformations

By the Hadamard structure theorem, the Eulerian derivative for the continuous problem consists of normal boundary forces only, see (10) above. This is no longer the case for the discrete problem. The reason is that the finite element solutions and are only of limited regularity, and thus a global integration by parts necessary to pass from the volume expression (16) to a boundary expression is not available. This has been pointed out, for instance, in [DelfourZolesio2011, note on p. 562]. Therefore, we are going to continue with the discretely exact volume expression (16) but mimic the behavior of the continuous setting in the evaluation of the shape gradient, where we alloy only for shape displacements which are induced by normal forces.

5.1 Continuous Setting

To illustrate the situation, we start by discussing the continuous case. We have seen in (8) that the Eulerian derivative is an element of a dual space, e.g. an element of . In order to utilize this information for moving the domain , we have to convert this dual element into a proper function. We follow the approach of [SchulzSiebenbornWelker2015:2]. To this end, we introduce the elasticity operator via


for all . Here and throughout, denotes the derivative (Jacobian) of a vector valued function, is the linearized strain tensor, are the Lamé parameters and is a damping term. We assume , so that becomes positive semi-definite on . Note that we do not consider Dirichlet boundary conditions in the space . Therefore a positive damping parameter is needed to ensure the coercivity of , i.e., with some . This result is due to Korn’s inequality, see for instance [AttouchButtazzoMichaille2014, Proposition 6.6.1]. Thus, is an isomorphism and it furnishes with an inner product so that becomes the associated Riesz isomorphism.

In order to avoid technical regularity issues, we assume that the Eulerian derivative (8) enjoys the higher regularity . This holds, e.g., if is sufficiently smooth, due to the higher regularity of and . In order to compute the negative shape gradient w.r.t. the -inner product on the continuous level, we solve


The solution of this problem yields the negative shape gradient


Now, we introduce the normal force operator given by


for all and . Using again (10), we find that can be written as with

Therefore, it is easy to see that problem (18) is equivalent to

Minimize (21)
with respect to
such that

Indeed, the additional constraint is automatically satisfied by the unconstrained solution of (18). However, we will see that this property is lost after discretization, i.e., the discrete counterparts of (18) and (21) are going to differ. Note that the solution of (21) is unique due to coercivity of and injectivity of . Moreover, since is surjective, there exists a unique Lagrange multiplier associated with the constraint ; see for instance [Luenberger1969, Chapter 9.3, Theorem 1]. We therefore obtain the following necessary and sufficient optimality conditions for (21) in saddle-point form,


Here, is the adjoint of , where we identified with its dual space. The multiplier in (22) necessarily satisfies since is bijective. Now, it is easy to see that (22) is equivalent to solving \cref@addtoresetequationparentequation


Recall that is the usual negative shape gradient w.r.t.  (i.e., the solution of (18)), whereas is a correction in order to obtain a shape displacement in the subspace . Again, we emphasize that we have in the continuous setting, due to . Therefore, the solution of (23) is just the usual shape gradient .

Before discussing the discretized setting, we note that (21) is equivalent to

Minimize (24)
with respect to
such that

Hence, the solution is the orthogonal projection (w.r.t. the inner product induced by ) of the usual shape gradient into the space , i.e., the space of deformations induced by normal forces. This motivates to denote the solution of (21) by .

5.2 Discretized Setting

Next, we discuss the discretized setting. We refer to Section 4 above for the introduction of the finite-element discretization. In addition to the FE space , we recall from (15) the discrete space of mesh deformations

and the boundary space


We recall that the discrete Eulerian derivative was given in (16). Moreover, the discretization directly leads to the discretized operators , which are defined via

for all and . Next, we will investigate the discrete counterparts of (18) and (21). The straightforward discretization of (18) reads


We denote its unique solution by .

The important difference to the continuous case is that Hadamard’s structure theorem is not available. The reason is that the discrete state has only the limited regularity and this regularity is not enough to transform the domain integral into a boundary integral via integration by parts, see the last paragraph in chapter 10, section 5.6 of [DelfourZolesio2011]. Therefore, unlike in the continuous case, does not belong, in general, to the image space of . Consequently, the solution of (26) has contributions not only from normal forces in the Eulerian derivative , but also from interior forces as well as tangential boundary forces. Numerical examples in Section 6 will show that these interior and tangential forces are responsible for spurious descent directions, which in turn lead to degenerate meshes.

Therefore, we conclude that it is not reasonable to try to solve

or its stationarity condition


as a discretization of the continuous problem (1).

Hence, we consider the discretization of (21)

Minimize (28)
with respect to
such that

in which we restrict to the image space of the discrete normal force operator . As in the continuous setting, this problem is equivalent to the solution of


It is clear that (29) can also be reduced as in (23). For later reference, we mention that the solution of (29) satisfies


since holds. This shows that is always a descent direction for the discrete objective .

As we have seen in (24) for the continuous setting, the solution of (28) also solves

Minimize (31)
with respect to
such that

where is the solution of (26). Again, the solution of (31) can be interpreted as the projection (w.r.t. the inner product) of onto the image space of . Therefore, the notation for the solution of (28) is justified.

Our main idea is now to propose, instead of (27),


as an appropriate discrete version of (1). Note that this is fundamentally different from the ad-hoc discretization (27) since we neglect the contributions of which do not belong to the image space of . We will see via numerical examples that this problem (32) can be solved to high accuracy by an iterative algorithm using the solution of (28) for the displacement of the triangulation (together with a line-search). Moreover, we will see that the solutions converge to a stationary point of the continuous problem (1) under suitable assumptions when successively finer meshes are used; see Section 9 below.

For later use, we are going to characterize stationarity of in the sense of (32). The deformation solves the projection problem (31) if and only if

This, in turn, is equivalent to


This means that is stationary in the sense of (32) if and only if the usual shape gradient is a tangential vector field on in a discrete sense.

We can now state a restricted gradient algorithm for the solution of (32), where we use as the deformation field which provides the search direction in the domain transformation. It is sufficient to utilize a simple a backtracking strategy to comply with the Armijo condition


Here, is a parameter.

Since we are using the perturbation of identity approach (2) instead of a more sophisticated family of domain transformations, we also perform a mesh quality control in order to avoid gradient steps which are too large. To this end, we check that the conditions


are satisfied in every cell throughout the entire domain. Here, denotes the Frobenius norm of matrices. The first condition monitors the change of volume of the cell, while the second additionally inhibits large changes of the angles. Note that this amounts to checking three inequalities per cell. Due to (30), we use


as a convergence criterion for some small . These considerations lead to LABEL:alg:restricted_gradient_descent. \@floatalgocf[htp]     \end@float

6 Numerical Results: Restricted vs. Classical Gradient Method

The main goal of this section is to compare our proposed restricted gradient method, see LABEL:alg:restricted_gradient_descent, to a classical shape gradient method. The latter is identical to LABEL:alg:restricted_gradient_descent except that is replaced everywhere by the negative shape gradient from (26). We consider our model problem (3) with data as in Example 1. The line-search parameters and are used and the initial step size is chosen as . For the Lamé and damping parameters in the elasticity operator (17) we choose

where is Young’s modulus and is the Poisson ratio. The initial shape for both methods is the same as in Fig. 2 (left).

We implemented LABEL:alg:restricted_gradient_descent and its classical counterpart in \fenics, version 2018.1 ([LoggMardalWells2012:1]). Our implementation is freely available on GitHub, see [EtlingHerzogLoayzaWachsmuth2018:2]. All derivatives were automatically generated by the built-in algorithmic differentiation capabilities. The restricted shape gradient , i.e., the solution of (28), was computed via the discrete counterpart of (23). The linear system was solved using \scipy’s \lstinline!spsolve! with the \superlu solver ([Li2005]), i.e., with the setting \lstinline!use_umfpack = False!.

The restricted gradient method reached the desired tolerance

at iteration 864, while the classical gradient method was stopped at iteration 1000, where it had only reached

Figures 5 and 4 show the complete history of the objective and respective shape gradient norms. The geometry condition (35) was violated only once for both methods, namely in the first very iteration, leading to a reduction of the initial step size. The Armijo condition (34) failed approximately once per iteration on average. Typical accepted step sizes were and occasionally .

Figure 3: Intermediate shapes obtained with the classical gradient method (left) and the restricted gradient method (right) at iterations 5, 300, 600, and final (1000 and 864, respectively).

Fig. 3 shows the domains during the iteration of both methods for comparison. It can clearly be inferred that the initial iterates are virtually identical but both methods begin to produce visibly different shapes around iteration 500, when the objective value (shown in Fig. 4) has practically converged but the gradient norms are still

respectively. At this point, the classical gradient method starts to pursue spurious descent directions, which results in a further decrease of the discrete objective at the expense of increasingly degenerate meshes.

Figure 4: History of the objective value along the iterations.
Figure 5: History of the norm of the gradients (for the gradient descent method) and (for the restricted gradient method) along the iterations.

To further illustrate this point, we show in Fig. 6 visualizations of the Eulerian derivative for both methods; see (16). In fact, this is a linear functional on the space of piecewise linear perturbation fields . In Fig. 6 we display the representer of w.r.t. the inner product, i.e., we solve a linear system governed by a block-diagonal mass matrix.

Let us comment on the Eulerian derivative for the restricted gradient method as shown in the right column of Fig. 6. It is apparent that the displacement field , i.e., the solution of (26), is non-zero and in fact essentially the same for the iterations 300, 600, and 900 shown. However also has essentially no component in the space of deformations induced by normal forces. Therefore its projection into this space, see (31), leaves us with a very small norm , as shown in Fig. 5. The images visualizing the Eulerian derivative for the classical gradient method in the left column of Fig. 6 show that the method has allowed the spurious part of the derivative to build up, which eventually dominates the search direction.

Figure 6: Visualization of the Eulerian derivatives obtained with the classical gradient method (left) and the restricted gradient method (right) at iterations 300, 600, and final (1000 and 864, respectively).

7 Restricted Newton-Like Method

In the previous two sections we have seen that (32) is a reasonable discrete optimality condition and that it can be solved to high accuracy via a first-order gradient descent method. However, as is well known for the minimization of even mildly ill-conditioned quadratic polynomials, gradient descent methods require a large number of iterations to achieve convergence. We observed the same behavior in Section 6.

Therefore, we are also investigating a Newton-like method for solving (32). First, we focus on the continuous case and comment on its discretization afterwards. Let be our current iterate. As before, we denote by the associated state, see (4), and by the adjoint state, see (9). The solution of the restricted shape gradient problem (22) at is denoted by . Recall that our goal is to achieve or, equivalently, , cf. (32). In practice, we impose a stopping criterion of the form as we did for the gradient method.

In order to allow the reader to follow the derivation for the solution of (32) of our Newton method more easily, we draw the parallel with Newton’s method for for some . We consider the equation for the unknown update . In our context the iterate represents the current domain and the update corresponds to a perturbation field . Since the update takes into a new domain, we need to manipulate the expression and pull it back to . Finally, we linearize about , which amounts to .

In our Newton method we seek a deformation field (taking the role of above) such that the updated domain is stationary in the sense that the solution of (22) (at instead of ) satisfies . As in Section 5 we are only considering updates which are induced by a normal force , i.e., should hold.

In order to characterize the stationarity of the transformed domain , we introduce the elasticity operator and the normal force operator on analogously to (17) and (20). With the transformation field , we define the pull-backs of and of via

for .

Since we wish to achieve conditions defined on the current domain , rather than on the unknown transformed domain after the Newton step, we consider the Lagrangian associated with problem (3) on and pull it back to . Using the usual integral substitution and chain rule, and denoting the pulled-back solutions of the state and adjoint equations on by and , respectively, we obtain , defined as

Notice that is the shape derivative . Thus we find that the stationarity of is equivalent to the requirement that the solution of the nonlinear system

satisfies . In view of the injectivity of , this is equivalent to . Here, “” stands for a zero block. We mention that the first two equations in this system correspond to the adjoint and state equation on but pulled back to , respectively. Moreover, note that the solution of the above system is the pull-back of the solution of the state equation, adjoint equation and the projected shape gradient of the system (22) formulated on the domain .

Together with the requirement that the deformation field itself is induced by some normal force , we have to solve the nonlinear system


As before, and denote the normal force operator and the elasticity operator on .

The system (37) for and the further, auxiliary unknowns corresponds to the nonlinear system for the step . For convenience, we recall the meaning of the seven equations in (37). The first equation requires , i.e., the stationarity of the updated domain . The second equation is the requirement that the displacement is induced by the (normal) force . The third and fourth equation are the adjoint and state equation on . Finally, the last three equations are the pull-back of the system (22) on to .

We can now describe a step of our Newton-like procedure for the solution of the nonlinear system (37). Suppose that is the current domain and consider an iterate of the form with the state, the adjoint state, and the solution of (22) on . Notice that for this iterate, the residual of (37) is . Next we linearize the system (37) about this current iterate w.r.t. all seven variables. We refrain from stating the lengthy formula for the linear system which results. In practice, we generate this linear system governing the Newton step using the algorithmic differentiation capabilities of \fenics ([LoggMardalWells2012:1]). From the solution of that linear system we only extract the Newton update for the perturbation field. We refer to it as since its current value is zero. We then apply to the current domain to obtain the new domain