Variational discretization of a control-constrained parabolic bang-bang optimal control problem

Variational discretization of a control-constrained parabolic bang-bang optimal control problem

Nikolaus von Daniels111Schwerpunkt Optimierung und Approximation, Universität Hamburg, Bundesstraße 55, 20146 Hamburg, Germany, nvdmath@gmx.net, michael.hinze@uni-hamburg.de    Michael Hinze11footnotemark: 1
July 5, 2019

Abstract: We consider a control-constrained parabolic optimal control problem without Tikhonov term in the tracking functional. For the numerical treatment, we use variational discretization of its Tikhonov regularization: For the state and the adjoint equation, we apply Petrov-Galerkin schemes from [DanielsHinzeVierling] in time and usual conforming finite elements in space. We prove a-priori estimates for the error between the discretized regularized problem and the limit problem. Since these estimates are not robust if the regularization parameter tends to zero, we establish robust estimates, which — depending on the problem’s regularity — enhance the previous ones. In the special case of bang-bang solutions, these estimates are further improved. A numerical example confirms our analytical findings.

Keywords: Optimal control, Heat equation, Control constraints, Finite elements, A-priori error estimates, Bang-bang controls.

1 Introduction

In this article we are interested in the numerical solution of the optimal control problem

Here, is basically the (weak) solution operator of the heat equation, the set of admissible controls is given by box constraints, and is a given function to be tracked.

Often, the solutions of () possess a special structure: They take values only on the bounds of the admissible set and are therefore called bang-bang solutions.

Theoretical and numerical questions related to this control problem attracted much interest in recent years, see, e.g., [deckelnick-hinze], [wachsmuth1], [wachsmuth2], [wachsmuth3], [wachsmuth4], [wachsmuth5], [gong-yan], [felgenhauer2003], [alt-bayer-etal2], [alt-seydenschwanz-reg1], and [seydenschwanz-regkappa]. The last four papers are concerned with being the solution operator of an ordinary differential equation, the former papers with being a solution operator of an elliptic PDE or being a continuous linear operator. In [dissnvd], a brief survey of the content of these and some other related papers is given at the end of the bibliography.

Problem () is in general ill-posed, meaning that a solution does not depend continuously on the datum , see [wachsmuth2, p. 1130]. The numerical treatment of a discretized version of () is also challenging, e.g., due to the absense of formula (10) in the case , which corresponds to problem ().

Therefore we use Tikhonov regularization to overcome these difficulties. The regularized problem is given by

where denotes the regularization parameter. Note that for , problem () reduces to problem ().

For the numerical treatment of the regularized problem, we then use variational discretization introduced by Hinze in [Hinze2005], see also [hpuu, Chapter 3.2.5]. The state equation is treated with a Petrov-Galerkin scheme in time using a piecewise constant Ansatz for the state and piecewise linear, continuous test functions. This results in variants of the Crank-Nicolson scheme for the discretization of the state and the adjoint state, which were proposed recently in [DanielsHinzeVierling]. In space, usual conforming finite elements are taken. See [dissnvd] for the fully discrete case and [SpringerVexler2013] for an alternative discontinuous Galerkin approach.

The purpose of this paper is to prove a-priori bounds for the error between the discretized regularized problem and the limit problem, i.e. the continuous unregularized problem.

We first derive error estimates between the discretized regularized problem and its continuous counterpart. Together with Tikhonov error estimates recently obtained in [daniels], see also [dissnvd], one can establish estimates for the total error between the discretized regularized solution and the solution of the continous limit problem, i.e. . Here, second order convergence in space is not achievable and (without coupling) the estimates are not robust if tends to zero. Using refined arguments, we overcome both drawbacks. In the special case of bang-bang controls, we further improve those estimates.

The obtained estimates suggest a coupling rule for the parameters (regularization parameter), , and (time and space discretization parameters, respectively) to obtain optimal convergence rates which we numerically observe.

The paper is organized as follows.

In the next section, we introduce the functional analytic description of the regularized problem. We recall several of its properties, such as existence of a unique solution for all (thus especially in the limit case we are interested in), an explicit characterization of the solution structure, and the function space regularity of the solution. We then introduce the Tikhonov regularization and recall some error estimates under suitable assumptions. In the special case of bang-bang controls, we recall a smoothness-decay lemma which later helps to improve the error estimates for the discretized problem.

The third section is devoted to the discretization of the optimal control problem. At first, the discretization of the state and adjoint equation is introduced and several error estimates needed in the later analysis are recalled. Then, the analysis of variational discretization of the optimal control problem is conducted.

The last section discusses a numerical example where we observe the predicted orders of convergence.

2 The continuous optimal control problem

2.1 Problem setting and basic properties

Let , , be a spatial domain which is assumed to be bounded and convex with a polygonal boundary . Furthermore, a fixed time interval , , a desired state , a non-negative real constant , and an initial value are prescribed. With the Gelfand triple we consider the following optimal control problem

where is the control space, the (closed and convex) set of admissible controls is defined by

with fixed control bounds , fulfilling almost everywhere in ,

 Y:=W(I):={v∈L2(I,H10(Ω))∣∣vt∈L2(I,H−1(Ω))}

is the state space, and the control operator as well as the control region are defined below.

Note that we use the notation and for weak time derivatives and for “for almost all”.

The operator

 S:L2(I,H−1(Ω))×L2(Ω)→W(I),(f,g)↦y:=S(f,g), (2)

denotes the weak solution operator associated with the heat equation, i.e., the linear parabolic problem

 ∂ty−Δy =f in I×Ω, y =0 in I×∂Ω, y(0) =g in Ω.

The weak solution is defined as follows. For the function with satisfies the two equations

 y(0)= g (3a) (3b)

Note that by the embedding , see, e.g., [evans, Theorem 5.9.3], the first relation is meaningful.
In the preceding equation, the bilinear form is given by

 a(f,g):=∫Ω∇f(x)∇g(x) dx.

We show below that (3) yields an operator in the sense of (2).

For the control region and the control operator we consider two situations.

1. (Distributed controls) We set , and define the control operator by , i.e., the identity mapping induced by the standard Sobolev embedding .

2. (Located controls) We set the control region . With a fixed functional the linear and continuous control operator is given by

 B:U=L2(I)→L2(I,H−1(Ω)),u↦(t↦u(t)g1). (4)

The case of fixed functionals with controls and a control operator , is a possible generalization. To streamline the presentation we restrict ourselves to the case here and refer to [dissnvd] for the case .

For later use we observe that the adjoint operator is given by

 B∗:L2(I,H10(Ω))→U=L2(I),(B∗q)(t)=⟨g1,q(t)⟩H−1(Ω)H10(Ω).

If furthermore holds, we can consider as an operator and get the adjoint operator

 B∗:L2(I,L2(Ω))→U=L2(I),(B∗q)(t)=(g1,q(t))L2(Ω).

Note that the adjoint operator (and also the operator itself) is preserving time regularity, i.e., for where is a subspace of depending on the regularity of the (as noticed just before), e.g., or .

Lemma 1 (Properties of the solution operator S).

1. For every a unique state satisfying (3) exists. Thus the operator from (2) exists. Furthermore the state fulfills

 ∥y∥W(I)≤C(∥f∥L2(I,H−1(Ω))+∥g∥L2(Ω)). (5)
2. Consider the bilinear form given by

 (6)

with . Then for , equation (3) is equivalent to

 A(y,v)=∫T0⟨f,v⟩dt+(g,v(0))L2(Ω)∀ v∈W(I). (7)

Furthermore, is the only function in fulfilling equation (7).

Proof.

This can be derived using standard results, see [dissnvd, Lemma 1]. ∎

An advantage of the formulation (7) in comparison to (3) is the fact that the weak time derivative of is not part of the equation. Later in discretizations of this equation, it offers the possibility to consider states which do not possess a weak time derivative.

We can now establish the existence of a solution to problem ().

Lemma 2 (Unique solution of the o.c.p.).

The optimal control problem () admits for fixed a unique solution , which can be characterized by the first order necessary and sufficient optimality condition

where denotes the adjoint operator of , and the so-called optimal adjoint state is the unique weak solution defined and uniquely determined by the equation

 A(v,¯pα)=∫T0⟨h,v⟩H−1(Ω)H10(Ω)dt∀ v∈W(I). (9)
Proof.

This follows from standard results, see, e.g., [dissnvd, Lemma 2]. ∎

As a consequence of the fact that is a closed and convex set in a Hilbert space we have the following lemma.

Lemma 3.

In the case the variational inequality (8) is equivalent to

where is the orthogonal projection.

Proof.

See [hpuu, Corollary 1.2, p. 70] with . ∎

The orthogonal projection in (10) can be made explicit in our setting.

Lemma 4.

Let us for with consider the projection of a real number into the interval , i.e., .

There holds for

Proof.

See [dissnvd, Lemma 4] for a proof of this standard result in our setting. ∎

We now derive an explicit characterization of the optimal control.

Lemma 5.

If , then for almost all there holds for the optimal control

 ¯uα(x)=⎧⎪⎨⎪⎩a(x)if B∗¯pα(x)+αa(x)>0,−α−1B∗¯pα(x)if B∗¯pα(x)+α¯uα(x)=0,b(x)if B∗¯pα(x)+αb(x)<0. (11)

Suppose is given. Then the optimal control fulfills a.e.

 ¯u0(x)={a(x)if B∗¯p0(x)>0,b(x)if B∗¯p0(x)<0. (12)
Proof.

We refer to [dissnvd, Lemma 5] for a proof of this standard result in our setting. ∎

Remark 6.

As a consequence of (12) we have: If vanishes only on a subset of with Lebesgue measure zero, the optimal control only takes values on the bounds of the admissible set . In this case is called a bang-bang solution.

Assuming more regularity on the data than stated above, we get regularity for the optimal state and the adjoint state needed for the convergence rates in the numerical realization of the problem.

We use here and in what follows the notation

 ∥⋅∥:=∥⋅∥L2(Ω),∥⋅∥I:=∥⋅∥L2(I,L2(Ω)),
 (⋅,⋅):=(⋅,⋅)L2(Ω),and(⋅,⋅)I:=(⋅,⋅)L2(I,L2(Ω)).
Assumption 7.

Let with and . Furthermore, we expect . In the case of distributed controls, we assume , . In the case of located controls, we assume , and .

Lemma 8 (Regularity of problem (P), α>0).

Let Assumption 7 hold and let . For the unique solution of () and the corresponding adjoint state there holds

• in the case of located controls or

• in the case of distributed controls.

With some constant independent of , we have the a priori estimates

 ∥∂2t¯y∥I+∥∂tΔ¯y∥I+maxt∈[0,T]∥∇∂t¯y(t)∥≤d1(¯u):=C(∥B¯u∥H1(I,L2(Ω))+∥∇B¯u(0)∥+∥∇Δy0∥),∥∂2t¯p∥I+∥∂tΔ¯p∥I+maxt∈[0,T]∥∇∂t¯p(t)∥≤d0(¯u):=C(∥yd∥H1(I,L2(Ω))+∥∇yd(T)∥+∥B¯u∥I+∥∇y0∥), and∥∂3t¯p∥I+∥∂2tΔ¯p∥I+maxt∈[0,T]∥∇∂2t¯p(t)∥≤d+1(¯u):=d1(¯u)+C(∥∂2tyd∥I+∥∇∂tyd(T)∥+∥∇Δyd(T)∥+∥∇B¯u(T)∥). (13)
Proof.

See [dissnvd, Lemma 12]. ∎

Remark 9 (Regularity in the case α=0).

In the case , we have less regularity:

• , and

Since (10) does not hold, we can not derive regularity for from that of as above. We only know from the definition of that , but might be discontinuous as we will see later.

2.2 Tikhonov regularization

For this subsection, it is useful to rewrite problem () in the reduced form () with , fixed data and the linear and continuous control-to-state operator , . From now onwards we assume

 a≤0≤b (14)

in a pointwise almost everywhere sense where and are the bounds of the admissable set . For the limit problem (), which we finally want to solve, this assumption can always be met by a simple transformation of the variables.

To prove rates of convergence with respect to , we rely on the following assumption.

Assumption 10.

There exist a set , a function with , and constants and , such that there holds the inclusion for the complement of and in addition

1. (source condition)

2. ((-)measure condition)

 ∀ ϵ>0:meas({x∈A|0≤|B∗¯p0(x)|≤ϵ})≤Cϵκ (16)

with the convention that if the left-hand side of (16) is zero for some .

For a discussion of this assumption we refer to the texts subsequent to [daniels, Assumption 7] or [dissnvd, Assumption 15].

Key ingredient in the analysis of the regularization error and also of the discretization error considered later is the following lemma, see [daniels, Lemma 8] or [dissnvd, Lemma 16] for a proof.

Lemma 11.

Let Assumption 10.2 hold. For the solution of (), there holds with some constant independent of and

Using this Lemma, we can now state regularization error estimates.

Theorem 12.

For the regularization error there holds with positive constants and indepent of the following.

1. Let Assumption 10.2 be satisfied with (measure condition holds a.e. on the domain). Then the estimates

 ∥¯uα−¯u0∥L1(ΩU) ≤Cακ (18) ∥¯uα−¯u0∥U ≤Cακ/2 (19) ∥¯yα−¯y0∥H ≤Cα(κ+1)/2 (20)

hold true. If holds and in addition

 T∗:range(T)→L∞(ΩU)% exists and is continuous, (21)

we can improve (20) to

 ∥¯yα−¯y0∥H≤Cακ. (22)
2. Let Assumption 10 be satisfied with (source and measure condition on parts of the domain). Then the following estimates

 ∥¯uα−¯u0∥L1(A) ≤Cαmin(κ,21+1/κ) (23) ∥¯uα−¯u0∥U ≤Cαmin(κ,1)/2 (24) ∥¯yα−¯y0∥H ≤Cαmin((κ+1)/2,1) (25)

hold true.

If furthermore and (21) hold, we have the improved estimate

 ∥¯uα−¯u0∥L1(A)≤Cακ. (26)

For a proof of this recent result, we refer to [daniels, Theorem 11] and [dissnvd, Theorem 19], where also a discussion can be found. We only recall two points for convenience here:

The assumption of the first case of the above Theorem implies

 meas({x∈ΩU|B∗¯p0(x)=0})=0, (27)

which induces bang-bang controls, compare Remark 6.

By Lemma 8 and Remark 9 we can immediately see that the assumption (21) on is fulfilled for our parabolic problem.

2.3 Bang-bang controls

We now introduce a second measure condition which leads to an improved bound on the decay of smoothness in the derivative of the optimal control when tends to zero. This bound will be useful later to derive improved convergence rates for the discretization errors.

Definition 13 (¯pα-measure condition).

If for the set

 Iα:={x∈ΩU|αa<−B∗¯pα<αb} (28)

the condition

 ∃ ¯α>0 ∀ 0<α<¯α:meas(Iα)≤Cακ (29)

holds true (with the convention that if the measure in (29) is zero for all ), we say that the -measure condition is fulfilled.

Theorem 14.

Let us assume the -condition

 ∃ σ>0 ∀′ x∈ΩU:a≤−σ<0<σ≤b. (30)

If the -measure condition (29) is valid, Theorem 12.1 holds, omitting its first sentence (“Let Assumption…”).

Proof.

See [daniels, Theorem 15] or [dissnvd, Theorem 24]. ∎

If the limit problem is of certain regularity, both measure conditions coincide:

Corollary 15.

Let a bang-bang solution be given, i.e., (27) holds true. In the case of , (21), and the -condition (30), both measure conditions are equivalent.

Proof.

See [daniels, Corollary 18] or [dissnvd, Corollary 27]. ∎

Let us now consider located controls. Since for by Lemma 8 and Remark 9, we conclude

 ∥∂tB∗¯pα∥L∞(I)≤C∥∂t¯pα∥L∞(I,L2(Ω))≤C+C∥¯uα∥U≤C

with a constant independent of due to the definition of . Recall that , by Assumption 7. With this estimate, the projection formula (10) and the stability of the projection (see [dissnvd, Lemma 11]) we obtain the bound

 ∥∂t¯uα∥L∞(I)≤1α∥∂tB∗¯pα∥L∞(I)+∥∂ta∥L∞(I)+∥∂tb∥L∞(I)≤C1α, (31)

if is sufficiently small.

If the -measure condition (29) is valid, this decay of smoothness in terms of can be relaxed in weaker norms, as the following Lemma shows.

Lemma 16 (Smoothness decay in the derivative).

Let the -measure condition (29) be fulfilled and located controls be given. Then for sufficiently small there holds

 ∥∂t¯uα∥Lp(I)≤Cmax(Cab,ακ/p−1) (32)

with a constant independent of . Here, and . Note that in the case of constant control bounds and .

Proof.

See [daniels, Lemma 19] or [dissnvd, Lemma 28]. ∎

The question of necessity of Assumption 10 and the -measure condition (28) to obtain the convergence rates of Theorem 12.1 is discussed in [daniels, sections 4 and 5] and [dissnvd, sections 1.4.3 and 1.4.4]. The results there show that in several cases the conditions are in fact necessary to obtain the convergence rates from above.

3 The discretized problem

3.1 Discretization of the optimal control problem

Consider a partition of the time interval . With we have . Furthermore, let for denote the interval midpoints. By we get a second partition of , the so-called dual partition, namely , with . The grid width of the first mentioned (primal) partition is given by the parameters and

 k=max1≤m≤Mkm.

Here and in what follows we assume . We also denote by (in a slight abuse of notation) the grid itself.

We need the following conditions on sequences of time grids.

Assumption 17.

There exist constants and independent of such that there holds

 ∀ m∈{1,2,…,M−1}:κ1≤kmkm+1≤κ2andk≤μminm=1,2,…,Mkm.

On these partitions of the time interval, we define the Ansatz and test spaces of the Petrov–Galerkin schemes. These schemes will replace the continuous-in-time weak formulations of the state equation and the adjoint equation, i.e., (7) and (9), respectively. To this end, we define at first for an arbitrary Banach space the semidiscrete function spaces

 Pk(X): ={v∈C([0,T],X)∣∣v∣∣∣∣Im∈P1(Im,X)}↪H1(I,X), (33a) P∗k(X): ={v∈C([0,T],X)∣∣v∣∣∣∣I∗m∈P1(I∗m,X)}↪H1(I,X), (33b) and Yk(X): ={v:[0,T]→X∗∣∣v∣∣∣∣Im∈P0(Im,X)}. (33d)

Here, , , , is the set of polynomial functions in time with degree of at most on the interval with values in . We note that functions in can be uniquely determined by elements from . The same holds true for functions but with only uniquely determined in by definition of the space. The reason for this is given in the discussion below [dissnvd, (2.16), p. 41]. Furthermore, for each function we have where denotes the equivalence class with respect to the almost-everywhere relation.

In the sequel, we will frequently use the following interpolation operators.

1. (Orthogonal projection)

 PYk(X)v∣∣∣∣Im:=1km∫tmtm−1vdt, m=1,…,M, PYk(X)v(T):=0 (34)
2. (Piecewise linear interpolation on the dual grid)

 πP∗k(X)v∣∣∣∣I∗1∪I∗2 :=v(t∗1)+t−t∗1t∗2−t∗1(v(t∗2)−v(t∗1)), (35) πP∗k(X)v∣∣∣∣I∗m for m=3,…,M−1, πP∗k(X)v∣∣∣∣I∗M∪I∗M+1

The interpolation operators are obviously linear mappings. Furthermore, they are bounded, and we have error estimates, as [dissnvd, Lemma 31] shows.

In addition to the notation introduced after Remark 6, adding a subscript to a norm will indicate an norm in the following. Inner products are treated in the same way.

Note that in all of the following results denotes a generic, strict positive real constant that does not depend on quantities which appear to the right or below of it.

Note that we can extend the bilinear form of (6) in its first argument to , thus consider the operator

 A:W(I)∪Yk(H10(Ω))×W(I)→R,%$A$givenby(???). (36)

Using continuous piecewise linear functions in space, we can formulate fully discretized variants of the state and adjoint equation.

We consider a regular triangulation of with mesh size

 h:=maxT∈Thdiam(T),

see, e.g., [brenner-scott, Definition (4.4.13)], and triangles. We assume that . We also denote by (in a slight abuse of notation) the grid itself.

With the space

 Xh:={ϕh∈C0(¯Ω)∣∣ϕh∣∣∣∣T∈P1(T,R)∀ T∈Th} (37)

we define to discretize .

For the space grid we make use of a standard grid assumption, as we did for the time grid, sometimes called quasi-uniformity.

Assumption 18.

There exists a constant independent of such that

 h≤μminT∈Thdiam(T).

We fix fully discrete ansatz and test spaces, derived from their semidiscrete counterparts from (33), namely

 Pkh:=Pk(Xh0),P∗kh:=P∗kh(Xh0),and Ykh:=Yk(Xh0). (38)

With these spaces, we introduce fully discrete state and adjoint equations as follows.

Definition 19 (Fully discrete adjoint equation).

For find such that

 A(~y,pkh)=∫T0⟨h(t),~y(t)⟩H−1(Ω)H10(Ω)dt∀ ~y∈Ykh. (39)
Definition 20 (Fully discrete state equation).

For find , such that

 A(ykh,vkh)=∫T0⟨f(t),vkh(t)⟩H−1(Ω)H10(Ω)dt+(g,vkh(0))∀ vkh∈Pkh. (40)

Existence and uniqueness of these two schemes follow as in the semidiscrete case discussed in [DanielsHinzeVierling] or [dissnvd, section 2.1.2].

Let us recall some stability results and error estimates for these schemes. The first result is [dissnvd, Lemma 56].

Lemma 21.

Let solve (39) with . Then there exists a constant independent of and such that

 ∥pkh∥H1(I,L2(Ω))+∥∇pkh∥C(¯I,L2(Ω))≤C∥h∥I.

For stability of a fully discrete state and an error estimate, we recall [dissnvd, Lemma 59].

Lemma 22.

Let be the solution of (7) for some and let be the solution of (40) for the same . Then with a constant independent of and , it holds

 ∥ykh∥I≤C(∥f∥L2(I,H−1(Ω))+∥g∥).

If furthermore the regularity as well as is fulfilled, we have the error estimate

 ∥y−ykh∥I≤C(h2+k)(∥f∥I+∥∇g∥). (41)

Let us now consider the error of the fully discrete adjoint state. We begin with an norm result, which is [dissnvd, Lemma 62].

Lemma 23.

Let solve (9) for some such that has the regularity . Let furthermore