Variational discretization of a controlconstrained parabolic bangbang optimal control problem
Abstract:
We consider a controlconstrained parabolic optimal control problem
without Tikhonov term in the tracking functional.
For the numerical treatment, we use
variational discretization of its Tikhonov regularization:
For the state and the adjoint equation, we apply
PetrovGalerkin schemes from [DanielsHinzeVierling] in time
and usual conforming finite elements in space.
We prove apriori estimates for the error between the
discretized regularized problem and the limit problem.
Since these estimates are not robust
if the regularization parameter tends to zero,
we establish robust estimates, which
— depending on the problem’s regularity —
enhance the previous ones.
In the special case of bangbang solutions, these estimates are
further improved.
A numerical example confirms our analytical findings.
Keywords: Optimal control, Heat equation, Control constraints, Finite elements, Apriori error estimates, Bangbang controls.
1 Introduction
In this article we are interested in the numerical solution of the optimal control problem
() 
Here, is basically the (weak) solution operator of the heat equation, the set of admissible controls is given by box constraints, and is a given function to be tracked.
Often, the solutions of () possess a special structure: They take values only on the bounds of the admissible set and are therefore called bangbang solutions.
Theoretical and numerical questions related to this control problem attracted much interest in recent years, see, e.g., [deckelnickhinze], [wachsmuth1], [wachsmuth2], [wachsmuth3], [wachsmuth4], [wachsmuth5], [gongyan], [felgenhauer2003], [altbayeretal2], [altseydenschwanzreg1], and [seydenschwanzregkappa]. The last four papers are concerned with being the solution operator of an ordinary differential equation, the former papers with being a solution operator of an elliptic PDE or being a continuous linear operator. In [dissnvd], a brief survey of the content of these and some other related papers is given at the end of the bibliography.
Problem () is in general illposed, meaning that a solution does not depend continuously on the datum , see [wachsmuth2, p. 1130]. The numerical treatment of a discretized version of () is also challenging, e.g., due to the absense of formula (10) in the case , which corresponds to problem ().
Therefore we use Tikhonov regularization to overcome these difficulties. The regularized problem is given by
() 
where denotes the regularization parameter. Note that for , problem () reduces to problem ().
For the numerical treatment of the regularized problem, we then use variational discretization introduced by Hinze in [Hinze2005], see also [hpuu, Chapter 3.2.5]. The state equation is treated with a PetrovGalerkin scheme in time using a piecewise constant Ansatz for the state and piecewise linear, continuous test functions. This results in variants of the CrankNicolson scheme for the discretization of the state and the adjoint state, which were proposed recently in [DanielsHinzeVierling]. In space, usual conforming finite elements are taken. See [dissnvd] for the fully discrete case and [SpringerVexler2013] for an alternative discontinuous Galerkin approach.
The purpose of this paper is to prove apriori bounds for the error between the discretized regularized problem and the limit problem, i.e. the continuous unregularized problem.
We first derive error estimates between the discretized regularized problem and its continuous counterpart. Together with Tikhonov error estimates recently obtained in [daniels], see also [dissnvd], one can establish estimates for the total error between the discretized regularized solution and the solution of the continous limit problem, i.e. . Here, second order convergence in space is not achievable and (without coupling) the estimates are not robust if tends to zero. Using refined arguments, we overcome both drawbacks. In the special case of bangbang controls, we further improve those estimates.
The obtained estimates suggest a coupling rule for the parameters (regularization parameter), , and (time and space discretization parameters, respectively) to obtain optimal convergence rates which we numerically observe.
The paper is organized as follows.
In the next section, we introduce the functional analytic description of the regularized problem. We recall several of its properties, such as existence of a unique solution for all (thus especially in the limit case we are interested in), an explicit characterization of the solution structure, and the function space regularity of the solution. We then introduce the Tikhonov regularization and recall some error estimates under suitable assumptions. In the special case of bangbang controls, we recall a smoothnessdecay lemma which later helps to improve the error estimates for the discretized problem.
The third section is devoted to the discretization of the optimal control problem. At first, the discretization of the state and adjoint equation is introduced and several error estimates needed in the later analysis are recalled. Then, the analysis of variational discretization of the optimal control problem is conducted.
The last section discusses a numerical example where we observe the predicted orders of convergence.
2 The continuous optimal control problem
2.1 Problem setting and basic properties
Let , , be a spatial domain which is assumed to be bounded and convex with a polygonal boundary . Furthermore, a fixed time interval , , a desired state , a nonnegative real constant , and an initial value are prescribed. With the Gelfand triple we consider the following optimal control problem
()  
where is the control space, the (closed and convex) set of admissible controls is defined by
(1) 
with fixed control bounds , fulfilling almost everywhere in ,
is the state space, and the control operator as well as the control region are defined below.
Note that we use the notation and for weak time derivatives and for “for almost all”.
The operator
(2) 
denotes the weak solution operator associated with the heat equation, i.e., the linear parabolic problem
The weak solution is defined as follows. For the function with satisfies the two equations
(3a)  
(3b) 
Note that by the embedding
, see, e.g., [evans, Theorem 5.9.3],
the first relation is meaningful.
In the preceding equation, the bilinear form
is given by
We show below that (3) yields an operator in the sense of (2).
For the control region and the control operator we consider two situations.

(Distributed controls) We set , and define the control operator by , i.e., the identity mapping induced by the standard Sobolev embedding .

(Located controls) We set the control region . With a fixed functional the linear and continuous control operator is given by
(4) The case of fixed functionals with controls and a control operator , is a possible generalization. To streamline the presentation we restrict ourselves to the case here and refer to [dissnvd] for the case .
For later use we observe that the adjoint operator is given by
If furthermore holds, we can consider as an operator and get the adjoint operator
Note that the adjoint operator (and also the operator itself) is preserving time regularity, i.e., for where is a subspace of depending on the regularity of the (as noticed just before), e.g., or .
Lemma 1 (Properties of the solution operator ).
Proof.
This can be derived using standard results, see [dissnvd, Lemma 1]. ∎
An advantage of the formulation (7) in comparison to (3) is the fact that the weak time derivative of is not part of the equation. Later in discretizations of this equation, it offers the possibility to consider states which do not possess a weak time derivative.
Lemma 2 (Unique solution of the o.c.p.).
The optimal control problem () admits
for fixed
a unique solution
, which can be characterized
by the first order necessary and sufficient optimality condition
(8) 
where denotes the adjoint operator of , and the socalled optimal adjoint state is the unique weak solution defined and uniquely determined by the equation
(9) 
Proof.
This follows from standard results, see, e.g., [dissnvd, Lemma 2]. ∎
As a consequence of the fact that is a closed and convex set in a Hilbert space we have the following lemma.
Lemma 3.
In the case the variational inequality (8) is equivalent to
(10) 
where is the orthogonal projection.
Proof.
See [hpuu, Corollary 1.2, p. 70] with . ∎
The orthogonal projection in (10) can be made explicit in our setting.
Lemma 4.
Let us for with consider the projection of a real number into the interval , i.e., .
There holds for
Proof.
See [dissnvd, Lemma 4] for a proof of this standard result in our setting. ∎
We now derive an explicit characterization of the optimal control.
Lemma 5.
If , then for almost all there holds for the optimal control
(11) 
Suppose is given. Then the optimal control fulfills a.e.
(12) 
Proof.
We refer to [dissnvd, Lemma 5] for a proof of this standard result in our setting. ∎
Remark 6.
As a consequence of (12) we have: If vanishes only on a subset of with Lebesgue measure zero, the optimal control only takes values on the bounds of the admissible set . In this case is called a bangbang solution.
Assuming more regularity on the data than stated above, we get regularity for the optimal state and the adjoint state needed for the convergence rates in the numerical realization of the problem.
We use here and in what follows the notation
Assumption 7.
Let with and . Furthermore, we expect . In the case of distributed controls, we assume , . In the case of located controls, we assume , and .
Lemma 8 (Regularity of problem (), ).
Let Assumption 7 hold and let . For the unique solution of () and the corresponding adjoint state there holds



in the case of located controls or

in the case of distributed controls.
With some constant independent of , we have the a priori estimates
(13) 
Proof.
See [dissnvd, Lemma 12]. ∎
Remark 9 (Regularity in the case ).
In the case , we have less regularity:

, and

Since (10) does not hold, we can not derive regularity for from that of as above. We only know from the definition of that , but might be discontinuous as we will see later.
2.2 Tikhonov regularization
For this subsection, it is useful to rewrite problem () in the reduced form () with , fixed data and the linear and continuous controltostate operator , . From now onwards we assume
(14) 
in a pointwise almost everywhere sense where and are the bounds of the admissable set . For the limit problem (), which we finally want to solve, this assumption can always be met by a simple transformation of the variables.
To prove rates of convergence with respect to , we rely on the following assumption.
Assumption 10.
There exist a set , a function with , and constants and , such that there holds the inclusion for the complement of and in addition

(source condition)
(15) 
(()measure condition)
(16) with the convention that if the lefthand side of (16) is zero for some .
For a discussion of this assumption we refer to the texts subsequent to [daniels, Assumption 7] or [dissnvd, Assumption 15].
Key ingredient in the analysis of the regularization error and also of the discretization error considered later is the following lemma, see [daniels, Lemma 8] or [dissnvd, Lemma 16] for a proof.
Lemma 11.
Let Assumption 10.2 hold. For the solution of (), there holds with some constant independent of and
(17) 
Using this Lemma, we can now state regularization error estimates.
Theorem 12.
For the regularization error there holds with positive constants and indepent of the following.
For a proof of this recent result, we refer to [daniels, Theorem 11] and [dissnvd, Theorem 19], where also a discussion can be found. We only recall two points for convenience here:
The assumption of the first case of the above Theorem implies
(27) 
which induces bangbang controls, compare Remark 6.
2.3 Bangbang controls
We now introduce a second measure condition which leads to an improved bound on the decay of smoothness in the derivative of the optimal control when tends to zero. This bound will be useful later to derive improved convergence rates for the discretization errors.
Definition 13 (measure condition).
If for the set
(28) 
the condition
(29) 
holds true (with the convention that if the measure in (29) is zero for all ), we say that the measure condition is fulfilled.
Theorem 14.
Let us assume the condition
(30) 
Proof.
See [daniels, Theorem 15] or [dissnvd, Theorem 24]. ∎
If the limit problem is of certain regularity, both measure conditions coincide:
Corollary 15.
Proof.
See [daniels, Corollary 18] or [dissnvd, Corollary 27]. ∎
Let us now consider located controls. Since for by Lemma 8 and Remark 9, we conclude
with a constant independent of due to the definition of . Recall that , by Assumption 7. With this estimate, the projection formula (10) and the stability of the projection (see [dissnvd, Lemma 11]) we obtain the bound
(31) 
if is sufficiently small.
If the measure condition (29) is valid, this decay of smoothness in terms of can be relaxed in weaker norms, as the following Lemma shows.
Lemma 16 (Smoothness decay in the derivative).
Let the measure condition (29) be fulfilled and located controls be given. Then for sufficiently small there holds
(32) 
with a constant independent of . Here, and . Note that in the case of constant control bounds and .
Proof.
See [daniels, Lemma 19] or [dissnvd, Lemma 28]. ∎
The question of necessity of Assumption 10 and the measure condition (28) to obtain the convergence rates of Theorem 12.1 is discussed in [daniels, sections 4 and 5] and [dissnvd, sections 1.4.3 and 1.4.4]. The results there show that in several cases the conditions are in fact necessary to obtain the convergence rates from above.
3 The discretized problem
3.1 Discretization of the optimal control problem
Consider a partition of the time interval . With we have . Furthermore, let for denote the interval midpoints. By we get a second partition of , the socalled dual partition, namely , with . The grid width of the first mentioned (primal) partition is given by the parameters and
Here and in what follows we assume . We also denote by (in a slight abuse of notation) the grid itself.
We need the following conditions on sequences of time grids.
Assumption 17.
There exist constants and independent of such that there holds
On these partitions of the time interval, we define the Ansatz and test spaces of the Petrov–Galerkin schemes. These schemes will replace the continuousintime weak formulations of the state equation and the adjoint equation, i.e., (7) and (9), respectively. To this end, we define at first for an arbitrary Banach space the semidiscrete function spaces
(33a)  
(33b)  
and  
(33d) 
Here, , , , is the set of polynomial functions in time with degree of at most on the interval with values in . We note that functions in can be uniquely determined by elements from . The same holds true for functions but with only uniquely determined in by definition of the space. The reason for this is given in the discussion below [dissnvd, (2.16), p. 41]. Furthermore, for each function we have where denotes the equivalence class with respect to the almosteverywhere relation.
In the sequel, we will frequently use the following interpolation operators.

(Orthogonal projection)
(34) 
(Piecewise linear interpolation on the dual grid)
(35)
The interpolation operators are obviously linear mappings. Furthermore, they are bounded, and we have error estimates, as [dissnvd, Lemma 31] shows.
In addition to the notation introduced after Remark 6, adding a subscript to a norm will indicate an norm in the following. Inner products are treated in the same way.
Note that in all of the following results denotes a generic, strict positive real constant that does not depend on quantities which appear to the right or below of it.
Note that we can extend the bilinear form of (6) in its first argument to , thus consider the operator
(36) 
Using continuous piecewise linear functions in space, we can formulate fully discretized variants of the state and adjoint equation.
We consider a regular triangulation of with mesh size
see, e.g., [brennerscott, Definition (4.4.13)], and triangles. We assume that . We also denote by (in a slight abuse of notation) the grid itself.
With the space
(37) 
we define to discretize .
For the space grid we make use of a standard grid assumption, as we did for the time grid, sometimes called quasiuniformity.
Assumption 18.
There exists a constant independent of such that
We fix fully discrete ansatz and test spaces, derived from their semidiscrete counterparts from (33), namely
(38) 
With these spaces, we introduce fully discrete state and adjoint equations as follows.
Definition 19 (Fully discrete adjoint equation).
For find such that
(39) 
Definition 20 (Fully discrete state equation).
For find , such that
(40) 
Existence and uniqueness of these two schemes follow as in the semidiscrete case discussed in [DanielsHinzeVierling] or [dissnvd, section 2.1.2].
Let us recall some stability results and error estimates for these schemes. The first result is [dissnvd, Lemma 56].
Lemma 21.
Let solve (39) with . Then there exists a constant independent of and such that
For stability of a fully discrete state and an error estimate, we recall [dissnvd, Lemma 59].
Lemma 22.
Let us now consider the error of the fully discrete adjoint state. We begin with an norm result, which is [dissnvd, Lemma 62].
Lemma 23.
Let solve (9) for some such that has the regularity . Let furthermore