An adaptive Newton algorithm for optimal control problems with application to optimal electrode design

An adaptive Newton algorithm for optimal control problems with application to optimal electrode design


In this work we present an adaptive Newton-type method to solve nonlinear constrained optimization problems in which the constraint is a system of partial differential equations discretized by the finite element method. The adaptive strategy is based on a goal-oriented a posteriori error estimation for the discretization and for the iteration error. The iteration error stems from an inexact solution of the nonlinear system of first order optimality conditions by the Newton-type method. This strategy allows to balance the two errors and to derive effective stopping criteria for the Newton-iterations. The algorithm proceeds with the search of the optimal point on coarse grids which are refined only if the discretization error becomes dominant. Using computable error indicators the mesh is refined locally leading to a highly efficient solution process. The performance of the algorithm is shown with several examples and in particular with an application in the neurosciences: the optimal electrode design for the study of neuronal networks.

1 Introduction

In this work we consider the optimal design of a glass micro-electrode for the use of reversible in vivo electroporation in neural tissue. Electroporation describes the increase in permeability of the cell membrane by the application of an external electric field beyond a certain threshold [32, 36]. While this technique has been known at least since the 1960’s [19], it has become a standard tool in the neurosciences in more recent years to load single cells and small ensembles of neurons with a range of dyes and molecules, for example for the visualization of neural networks [18, 24, 25], see Figure 1 on the left.

In order to make the plasma membrane permeable for a specific dye, the local voltage has to exceed a certain threshold. On the other hand the applied stimulus can not be increased infinitely, as high peaks of current would cause collateral damage [25, 16]. A way to reduce such unwanted side-effects is to modify the shape of the micro-electrodes, in order to obtain a more uniform distribution of the electric field. While standard electrodes have a single hole at the tip, adding more holes on the side of the pipette seems a promising approach. Recent work has shown that nanoengineering techniques are indeed available to shape glass micro-electrodes in the tip region using focused ion beam assisted milling [22, 31], see Figure 1 on the right. It has been shown that the part of the neuronal network, that can be visualized with these modified pipettes, is considerably enlarged in comparison to the standard design [31], see Figure 2 for a numerical demonstration.

The objective of this work is to design an optimal electrode in terms of position and size of holes in the micro-pipette by using methods of numerical optimization. The scientific contribution of this work is twofold: (i) on one side we present a mathematical formulation of the optimal design of a micro-pipette; (ii) on the other side we present an adaptive Newton method for the solution of the corresponding optimization problem.

The model used to describe the electric field is a partial differential equation (PDE). Therefore, we deal with a PDE constrained optimization problem. In the context of PDE constrained optimization problems the two common solution methods are the reduced and the all-at-once approach [20]. We will adopt the latter one in which the optimality conditions are expressed as Karush-Kuhn-Tucker (KKT) system. There is a large literature on this topic and we refer for example to the books [15, 20, 23] for a thorough introduction. Regarding the specific application there are no systematic studies that use a model based approach to design the micro-pipette used in electroporation. Therefore, the results shown here are of scientific interest even if they are obtained in a simplified setting with a two-dimensional problem. The extension to three-dimensional problems with a more complex model is possible within the same adaptive algorithm.

Figure 1: Left: Example of a genetically-tagged olfactory glomerulus in the mouse as an example of a medium-sized neural circuit in the brain (green, upper panel left). Upper panel right: Typical result after targeted electroporation of a tetramethyl-rhodamine-dextran dye (red) revealing various types of directly affiliated neurons and their processes in the surrounding region. Lower panel showing an overlay of the two fluorescent channels. Right: Example of a modified glass micro-electrode after inserting several additional openings around the tip region by focused ion beam assisted milling.

Mesh adaptivity is in many aspects well established in the context of finite element discretization of linear and nonlinear partial differential equations, see e.g. [2, 33]. Furthermore, goal oriented a posteriori error estimation has been successfully used in many applications, see the seminal works [6, 4] for an overview of the Dual Weighed Residual (DWR) technique and exemplarily [9, 30, 34, 8] for some specific applications. A posteriori error estimation methods have been used to control the discretization error either in global norms, e.g. the or energy norm, or in specific functionals in the context of goal oriented techniques.

To solve the nonlinear system arising from the discretization of the underlying problem typically a Newton-type method is used. If the Newton iteration is stopped after reaching a given tolerance, there is an iteration error that has to be taken into account in addition to the discretization error. In particular, it is advantageous to control the iteration error and allow the Newton-iterates to stop before full convergence ( machine precision), because each Newton-iteration comes at the cost of the solution of a large linear system. The latter might be badly conditioned, especially in the context of multi-physics and optimization problems, leading to a large number of iterations of an iterative linear solver. There are only few results on a posteriori error estimation that combine an estimation of the discretization error and of the iteration error, resulting in algorithms that have stopping criteria based on balancing the two sources of error.

In the last few years increasing attention has been given to adaptive strategies to solve nonlinear problems including those arising from discretizations of partial differential equations. Ziems and Ulbrich have presented in [37] a class of inexact multilevel trust-region sequential quadratic programming (SQP) methods for the solution of nonlinear PDE-constrained optimization problems, in which the discretization error in global norms is controlled by local error estimators including control of the inexactness of the iterative solvers. Further works can be found outside the optimization context. A list of relevant publications is here given:

Bernardi and coauthors have shown an a posteriori analysis of iterative algorithms for nonlinear problems [7], Rannacher and Vihharev have balanced the discretization error and the iteration error in a Newton-type solver [27]; Ern and Vohralík have developed an adaptive strategy for inexact Newton methods based on a posteriori error analysis [14] and Wihler and Amrein have presented an adaptive Newton-Galerkin method for semi-linear elliptic PDEs which combines an error estimation for the Newton step and an error estimation for the discretization with finite elements [1].

Since the goal of a simulation is the computation of a specific quantity of interest, for example in our case the optimal micro-pipette design (i.e. the position and dimension of the side holes), it is desirable to optimize the mesh refinement in a goal-oriented fashion. Furthermore, also the stopping criterion for the Newton iteration should be goal-oriented. This allows, for example in the context of optimization, to approximate the optimal point on coarse meshes and refine only once the discretization error becomes dominant. In this way we reach the full balance of error sources with respect to the quantity of interest and the algorithm does the costly iterates (on fine meshes) only after the nonlinearities have been adequately solved on cheaper meshes. Consequently the computational costs are reduced by keeping the precision of the simulation at the desired level. The new contribution of our work in this context is the derivation of a goal-oriented strategy for the adaptive control of a Newton-type algorithm to solve a nonlinear PDE-constrained optimization problem.

This work is organized as follows. In Section 2 we formulate the general optimization problem; in Section 3 we present our adaptive strategy; in Section 4 we introduce the application in optimal electrode design; in Section 5 we delineate the algorithms and in Section 6 we present some numerical results. Finally, in Section 7, an outlook to possible extensions of the presented method is given.

Figure 2: Two numerical results for the comparison of the activated region for a standard micro-pipette with one hole only (left) and a modified micro-pipette with two additional set of holes (right). The black contour line illustrates the region, where a certain threshold is exceeded.

2 Optimization problem

We consider the following optimization problem with parameters


We assume that is a reflexive Banach space. Let be a semi-linear form and for every , where denotes the dual space of . Furthermore, we assume that and are twice (Fréchet) differentiable and that for each the state equation (2) has a unique solution . Let us denote the (nonlinear) control-to-state map by .

Under these assumptions we can consider a reduced formulation of the optimization problem, with a reduced objective functional . If the reduced objective functional is coercive the existence of local minimizers to (1)-(2) follows by standard arguments, see e.g. [15, 20]. The coercivity assumption is needed in case of unconstrained optimization problems to assure boundedness of the minimizing sequence. Therefore, for the practical solution of the problem, we consider a Tikhonov regularization term in the objective functional. If in addition the functional is convex, the optimization problem has a unique solution. Since in this work we allow nonlinearities in the model, we cannot assume convexity of the reduced functional. Therefore, the theoretical results assure only the existence of local minimizers.

To derive the optimality conditions, we introduce the Lagrange functional


The first-order necessary optimality conditions are given by the KKT system


The first equation corresponds to the dual equation for the adjoint variable , the second equation is called the control equation and the third equation is the state equation for the primal variable .

2.1 Model problem

To simplify the notation in the introduction of the error estimator in the next section, we consider a model problem of the form

where , , and are positive real numbers, denotes the scalar product and . The corresponding KKT system reads

Problem 2.1 (KKT system of the model problem)

Find such that

By introducing the semi-linear form


we can write the KKT system in compact form as


The derivation of a corresponding adaptive Newton method for other functionals and semi-linear forms fulfilling the assumptions made above is straight-forward given that the KKT system is solvable with a Newton-type solver. The modification of the optimization problem to the specific application presented in this paper will be made later in Section 4.

2.2 Discretization

We choose conforming finite element spaces for the state variable and the dual variable . The control space is already finite dimensional, therefore we do not need a discretization of the control variable. The discrete optimality system reads

Problem 2.2 (Discrete KKT system of the model problem)

Find , and , such that


An essential problem in solving a discretized PDE system is the choice of the computational mesh on which depends the discretization error, i.e. the error due to the finite dimensional approximation given by the finite elements.

3 Adaptive strategy

In the case of optimization problems it is of interest to control the accuracy of the solution of the first-order optimality conditions. The accuracy depends on the discretization error and it “measures” the quality of the approximation of the optimal point, i.e. of the optimal control and optimal state. In the context of PDE constrained optimization problems, the two typical methods to solve the problem are the reduced approach and the all-at-once approach. Here we use the all-at-once approach, in which the optimality conditions are expressed in terms of the gradient of the Lagrangian functional defined in the previous section. In particular, in absence of control and/or state constraints the optimality conditions are given by

and the discrete counterpart is

Since the discrete approximation is accurate only up to a certain tolerance that depends on the actual mesh refinement, it makes sense for efficiency reasons to solve the optimality system only up to a certain accuracy as well.

The idea of our adaptive inexact Newton-type method is to balance the accuracy of the first order optimality conditions, i.e. of the KKT system, with the accuracy of its discrete approximation with respect to a goal functional, rather than with respect to some (global) norms of the solution or of the residuals. This is possible exploiting the flexibility of the DWR which allows to control the error with respect to an arbitrary functional.

In Section 3.1, we briefly introduce the DWR method and in Section 3.2 we explain how to split the error into two contributions: one from the mesh discretization and the other from the inexact solution of the KKT system.

3.1 Dual weighted residual (DWR) method

We are interested in estimating the error measured in a quantity of interest:

Following the seminal work of Becker and Rannacher [6] we obtain the error identity by weighting the residual of the KKT system by an appropriate dual problem. Let be the solution of the KKT system (2). For the DWR error representation we need the residual of the system, , defined by


with . Furthermore, we need the following adjoint problem to define the error estimator

Problem 3.1 (Dual problem)

Find such that

By setting , the dual system reads


with the adjoint bilinear form defined as

Its discretized counterpart is

Problem 3.2 (Discretized dual problem)

Find such that


Since the model problem is nonlinear in we need to define the following dual residual to derive the error estimator


with .

With these definitions, following [4, Proposition 6.2], we get the error estimator

Theorem 3.1 (A posteriori error estimator)

Let , be the solutions of Problem 2.1 and 2.2 and let , be the solutions of the continuous dual problem 3.1 and its discretized version 3.2. It holds the error identity


with the residual and the adjoint residual defined in (3.1) and (3.1). The remainder term is given by


where is the semi-linear form (5) and the primal and dual errors are and .

Proof 3.1

The proof follows by application of Proposition 6.1 from [4] with the following Lagrange functional

We sketch it here for later purposes. Introducing the notation and and reminding the definition of the semi-linear form , see expression (5), we can rewrite it as

Furthermore, it is

where we have used the fact that and satisfy (6) and (7) respectively. Considering the relation

the error identity follows from the error representation of the trapezoidal rule

In fact, since it is

where is the remainder term of the trapezoidal rule. From this relation, using the definitions (3.1) and (3.1), the identity (12) can be deduced observing that

3.2 Balancing of discretization and iteration error

In this work, we consider an inexact Newton-type method to solve the nonlinear KKT system (2.2). We introduce the notation to indicate the inexact solution of the KKT system, which is obtained when the stopping criterion


is reached and the notation to indicate the “perturbed” dual solution obtained by solving exactly (up to machine precision) the “perturbed dual equation”

We use the term “perturbed dual equation” for the adjoint equation in which we set the inexact primal solution as coefficient.

Since and are approximations of and , an additional term appears in the error identity (12) that accounts for the inexact Galerkin projection (14).

Following [27, Proposition 3.1] we have the error estimator

Theorem 3.2 (Error estimator with inexact Galerkin projection)

with the residuals of the primal problem (3.1) and of the dual problem (3.1) and the remainder term as in Problem 3.1.

Proof 3.2

Introducing the notation , and the Lagrangian as in Theorem (3.1), the proof follows from [27, Proposition 3.1]. Let us consider the Lagrangian

It follows that

where we have used the fact that satisfies (6), while equation (7) is solved only approximately. Considering the trapezoidal rule and its remainder term, we get analogously to Theorem 3.1 the identity

from which the error representation (15) can be deduced.

Definition 1 (Splitting of the error estimator)

For ease of presentation of the results and to derive the adaptive Newton strategy we split the error estimator into two parts. These are identified with the discretization error and the error due to the inexact Newton solution of the discrete KKT system :

Furthermore, using the error identity (15) we define


To evaluate the quality of the error estimator, we use the effectivity index

An index close to one means that the estimator is reliable. In the numerical examples in Section 6, we will observe that the indicator has a good effectivity index already at the beginning of the Newton iterations, when the solution approximation is inaccurate.

We conclude this section by anticipating that our adaptive strategy defined in the algorithms in Section 5 exploits the error splitting and attempts to balance the two error contributions, i.e. to reach the balance , during the Newton iterations. In this way the adaptive strategy attempts to reduce the goal functional of the optimization problem on coarse meshes and it proceeds with mesh refinement only if the discretization error is dominating.

Remark 1

In the residual (3.1) the term related to the residuum of the control equation is always zero because the control is finite dimensional. We keep it in the error representation because the same term in the residual in (15) is nonzero due to the inexact Galerkin projection. In fact, this term is essential to get a reliable error estimator. Also in the dual residual (3.1) the control term is zero. We keep the full expression here for the sake of completeness in the case of an infinite dimensional control space.

4 Application: Optimal electrode design

As already mentioned in the introduction, we apply our adaptive strategy to the optimal design of a micro-pipette to be used in electroporation. The objective is to maximize the area around the micro-pipette, where the voltage exceeds a certain threshold , while on the other hand an upper bound for the voltage shall not be reached.

The micro-pipette is covered by an isolating material such that the current can only flow to the biological tissue through the micro-pipette holes. While standard micro-pipettes have only one hole at the tip, holes can also be created on the sides of the micro-pipette using nanoengineering techniques [31].

As some of the parameters, as e.g.the conductivity of the medium , are only known in a very rough approximation, we cannot expect to obtain quantitative results at this stage. Therefore, it seems justified for a qualitative study to restrict the setting to a two-dimensional simplification. Furthermore, we restrict the possible design of the holes to a symmetric setting with the same size of the openings on both sides. In this context we are interested in the position and size of the openings as design parameters.

The domain of interest is a region around the micro-pipette , where neuronal cells might be activated. To reduce the influence of exterior boundary conditions, we use a larger box around the micro-pipette as simulation domain (Figure 3), excluding the micro-pipette itself. If we choose the box large enough, we can assume without loss of generality zero voltage at the outer boundary.

We assume that a fixed current is applied at the top of the micro-pipette. Knowing the resistances of the electrode, the approximation of the fluxes through the holes of the micro-pipette can be derived by physical laws (see the appendix) and are defined as flux (Neumann) conditions at the boundary of . The fluxes are expressed as functions of the radii and positions of the holes (which are the control variables for the optimal design), therefore the control variables define the Neumann boundary conditions as a finite dimensional parametrization.

4.1 Governing equations

Figure 3: Simulation domain with Dirichlet boundary and Neumann boundary parts . The sub-domain is used to define the objective functional.

The governing equations can be derived as follows. In the absence of electric charge the Gauss law states


for the electric field and the conductivity . With the electrostatic potential () (18) reads


We assume zero Dirichlet conditions at the outer boundaries of the box denoted by , see Figure 3. Then, we can rewrite (19) in terms of the voltage

The boundary condition at the micro-pipette is a flux condition

The flux is zero at the isolated parts of the micro-pipette. At the holes , the flux is given by

where denotes the current density at hole . Let us assume that the number of holes is fixed. Our design parameters will be the vertical position in terms of the midpoint of the holes and their sizes . The fluxes depend in a highly nonlinear way on (see the appendix). As the derivation of the analytical expressions for more than two sets of holes are complex, we restrict our study to a maximum of two additional sets of holes besides the hole at the tip. A possible extension of this work would be to not rely on analytical expressions for the fluxes but extend the computational domain to the interior part of the micro-pipette and approximate the fluxes with a finite element discretization. Since the restriction to two sets of holes is not a limitation to show the effectiveness of our approach, we consider the analytical expressions derived in the appendix to reduce the computational effort.

4.2 Objective functional

Our aim is to maximize the region where the voltage exceeds a certain threshold . At the same time, we have to ensure that the voltage does not exceed a critical value , with which the biological tissue might be damaged. The corresponding objective functional would be

where denotes the characteristic function of the set . The main issue of the objective functional is its non-differentiability. For the gradient-based optimization algorithm we present here, the functional is required to be at least differentiable.

Therefore, we consider another objective functional

where we choose a constant function that is used as a tracking term to reach the desired threshold. Moreover, to avoid the influence of a far-away region, where we cannot expect that any cell can be activated, we consider for the functional definition the previously defined domain of interest, i.e. the sub-domain around the micro-pipette.

Using this functional, the voltage does not exceed the critical value in the numerical simulations conducted for this paper. In fact, we have found that using a value slightly larger than the threshold is a good choice to get above to the threshold and to stay significantly below the critical value .

The application poses additional restrictions for the design parameters . Each hole has to lie above the tip of the micro-pipette and below the upper end of the bounding box and the holes should not overlap (otherwise the formulas derived in the appendix for the fluxes at the boundary are not valid anymore). In the numerical experiments conducted for this paper, however, these conditions were never violated. The addition of point-wise state constraints and/or control constraints , using an admissible set , can be done without significant changes in our approach using a penalty method and/or an active set strategy [21]. Hence to simplify the exposition, we do not incorporate state and control constraints in this work.

Finally, we add a regularization term with parameter to the objective functional as explained in Section 2. The optimization problem reads in variational formulation

Problem 4.1 (Optimization of the micro-pipette)

Find the pair and that minimizes the goal functional under the PDE constrain:


4.3 Karush-Kuhn-Tucker system

The Lagrange functional corresponding to problem 4.1 reads


with an adjoint variable . The first-order optimality conditions are given by:

Problem 4.2 (First-order optimality conditions)

Find , , , such that


4.4 Discretization and approximation of the flux boundary conditions

We use the space of standard finite elements on a quasi-uniform finite element mesh . Altogether, the discrete optimality system reads:

Problem 4.3 (Discrete first-order optimality conditions)

Find , , such that


Denoting by the vertical position on the micro-pipette (see Figure 3) and by the position of the tip, it holds for the flux function that

Note that is discontinuous in both the vertical coordinate and the parameter vector . This is a problem, since the derivative appears in the optimality system (4.3). Furthermore, numerical methods like Newton-type methods for solving (4.3) require at least the first derivative of the system which includes . To deal with this issue, we introduce a smooth approximation of . Let be the characteristic function on the interval . A smooth approximation to is given by . Based on this approximation, we define a regularized flux function by


for some , see Figure 4.

Figure 4: Smooth approximation with of the characteristic function .

Furthermore, we use a summed quadrature formula with sufficient integration points to evaluate the boundary integrals such that the decay of at the boundary of the openings is appropriately approximated.

4.5 Dual problem for the error estimation

As explained in Section 3.1 to estimate the discretization and iteration errors we need an approximation of the solution of an ad-hoc dual problem. In the specific case of the micro-pipette optimization the dual problem reads

Problem 4.4 (Dual micro-pipette problem)

Find and such that


This problem is discretized with finite elements to get the approximation . Furthermore, we observe that the system matrix of the dual problem is exactly the same matrix used in the Newton method to solve the primal problem, i.e. to solve the discrete KKT system (4.3). It is the Hessian of the Lagrange functional (21). It follows that the solution of the dual problem for the DWR method corresponds to one additional Newton step with a different right-hand side, which will be explicitly shown later in the algorithmic section 5.

4.6 A posteriori error estimators

We conclude this section by specifying the concrete error estimators for the KKT system (4.3). Following the derivation in Section 3, it holds that

with and specified in (16) and (17). An evaluation of the integrals over the cells leads in general to poor local error indicators due to the oscillatory nature of the residuals, see [11]. To avoid this behavior we integrate the residuals cell-wise by parts obtaining boundary terms (jump terms) to distribute the error on the inner cell-edges. To simplify the notation we use the symbols without tilde implicitly considering that all quantities and are perturbed. Furthermore, we separate in (16) the contribution from the primal and the dual residuals, i.e.  with


with the boundary residuals defined on by