Explicit Solution for Constrained Stochastic Linear-Quadratic Control with Multiplicative Noise

# Explicit Solution for Constrained Stochastic Linear-Quadratic Control with Multiplicative Noise

Weiping Wu ,  Jianjun Gao,
Duan Li,  Yun Shi
W. P. Wu is with the Automation Department, Shanghai Jiao Tong University, Shanghai, China (email: godream@sjtu.edu.cn).J. J. Gao is with School of Information Management and Engineering, University of Finance and Economics, Shanghai, China (e-mail:gao.jianjun@shufe.edu.cn).D. Li is with the Department of Systems Engineering Engineering Management, The Chinese University of Hong Kong, Hong Kong (email: dli@se.cuhk.edu.hk).Y. Shi is with the School of Management, Shanghai University, Shanghai China, (email:y_shi@shu.edu.cn).This research work was partially supported by National Natural Science Foundation of China under grant 61573244, and by the Research Grants Council of Hong Kong under grant 14213716.
###### Abstract

We study in this paper a class of constrained linear-quadratic (LQ) optimal control problem formulations for the scalar-state stochastic system with multiplicative noise, which has various applications, especially in the financial risk management. The linear constraint on both the control and state variables considered in our model destroys the elegant structure of the conventional LQ formulation and has blocked the derivation of an explicit control policy so far in the literature. We successfully derive in this paper the analytical control policy for such a class of problems by utilizing the state separation property induced from its structure. We reveal that the optimal control policy is a piece-wise affine function of the state and can be computed off-line efficiently by solving two coupled Riccati equations. Under some mild conditions, we also obtain the stationary control policy for infinite time horizon. We demonstrate the implementation of our method via some illustrative examples and show how to calibrate our model to solve dynamic constrained portfolio optimization problems.

Constrained linear quadratic control, stochastic control, dynamic mean-variance portfolio selection.

## I Introduction

We study in this paper the constrained linear-quadratic (LQ) control problem for the discrete-time stochastic scalar-state system with multiplicative noise. The past few years have witnessed intensified attention on this subject, due to its promising applications in different areas, including dynamic portfolio management, financial derivative pricing, population model, and nuclear heat transfer (see, e.g., [1][2][3]).

There exist in the literature various studies on the estimation and control problems of systems with a multiplicative noise [4][5]. As for the LQ type of stochastic optimal control problems with multiplicative noise, investigations have been focused on the LQ formulation with indefinite penalty matrices on control and state variables for both continuous-time and discrete-time models (see, e.g., [6][7][8][9][10]). One interesting finding is that even when the penalty matrices for both state and control are indefinite, this kind of models with multiplicative noise is still well-posed under some conditions. One important application of this kind of models arises in the dynamic mean-variance (MV) portfolio analysis [11] [12], which generalizes the Markowitz’s classical work [13] on static portfolio selection. Please see, e.g., [1][14] for some detailed surveys on this subject which has grown significantly in recent years.

One prominent attractiveness of the LQ type of optimal control models is its explicit control policy which can be derived by solving the correspondent Riccati equation. However, in real applications, due to some physical limits, consideration of the risk or the economic regulation restrictions, some constraints on the control variables have to be taken into the consideration. Unfortunately, when some control constraints are involved, except for a few special cases, there is hardly a closed-form control policy for the constrained LQ optimal control model. As for the deterministic LQ control problem, Gao et al. [15] investigate the LQ model with cardinality constraints on the control and derive a semi-analytical solution. For the model with inequality constraints on state and control, Bemporad et al. [16] propose a method by using a parametric programming approach to compute an explicit control policy. However, this method may suffer a heavy computational burden when the size of the problem is increasing. When the problem only involves the positivity constraint for the control, some scholarly works [17][18] provide the optimality conditions and some numerical methods to characterize the optimal control policy. Due to the difficulties in characterizing the explicit optimal control, it is more tractable to develop some approximated control policy by using, for example, the Model Predictive Control (MPC) approach [19][20]. The main idea behind the MPC is to solve a sub-problem with finite horizon at each time period for an open-loop control policy and implement such a control in a fashion of rolling horizon. As only a static optimization problem needs to be solved in each step, this kind of model can deal with general convex constraints. As for the stochastic MPC problem with multiplicative noise, Primbs et al.[3] propose a method by using the semi-definite programming and supply a condition for the system stability. Bernardini and Bemporad [21] study a similar problem with discrete random scenarios and propose some efficient computational methods by solving quadratically constrained quadratic programming problems off-line. Patrinos et al. [22] further extend such a method to solve the problem with Markovain jump. Readers may refer [23][24] for more complete surveys of MPC for stochastic systems.

The current literature lacks progress in obtaining an explicit solution for the constrained stochastic LQ type optimal control problems. However, some promising results have emerged recently for dynamic MV portfolio selection, a special class of such problems. Li et al. [25] characterize the analytical solution of the continuous-time MV portfolio selection problem with no shorting by using the viscosity solution of the partial differential equation. The work by Hu and Zhou [26] solve the cone constrained continuous-time LQ control problem with a scalar state by using the backward stochastic differential equation (BSDE) approach. Cui et al. [27] [28] solve the discrete-time version of this type of problems with no shorting constraint and cone constraints, respectively. Note that the models studied in [27] [28] are just some special cases of our model studied in this paper. Gao et al. [1] derive the solution for the dynamic portfolio optimization model with cardinality constraint with respect to the active periods in time.

In this paper, we focus on the constrained LQ optimal control for the scalar-state stochastic system with multiplicative noise. The contributions of our work include several aspects. First, we derive the analytical control law of this type of problem with a general class of general linear constraints, which goes beyond the cone constraints studied in [26][28]. This general constraint also includes positivity and negativity constraints, state-dependent upper and lower bound constraints as its special cases. We show that the control policy is a piece-wise affine function with respect to the state variable, which can be characterized by solving two coupled Riccati equations with two unknowns. Second, we extend such results to the problem with infinite horizon. We provide the condition on the existence of the solution for the correspondent algebraic Riccati equations and show that the closed-loop system is asymptotically stable under the stationary optimal control. Besides the theoretical study, we illustrate how to use this kind of models to solve the constrained dynamic mean-variance portfolio optimization.

The paper is organized as follows. Section II provides the formulations of the stochastic LQ control problem with control constraints for both finite and infinite horizons. Section III and Section IV develop the explicit solutions for these two problems, respectively. Section V illustrates how to apply our method to solve the constrained dynamic mean-variance portfolio selection problem. Section VI presents some numerical examples to demonstrate the effectiveness of the proposed solution schemes. Finally, Section VII concludes the paper with some possible further extensions.

Notations The notations and denote the zero matrix and the identity matrix, respectively, () denotes a positive semidefinite (positive definite) matrix, and () denotes the set of real (nonnegative real) numbers. We denote by the indicator function such that if the condition holds true and otherwise. Let be the conditional expectation with being the filtration (information set) at time . For any problem , we use to denote its optimal objective value.

## Ii Problem Formulations

### Ii-a Problem with Finite Time Horizon

In this work, we consider the following scalar-state discrete-time linear stochastic dynamic system,

 xt+1=Atxt+Btut, t=0,1,⋯,T−1, (1)

where is a finite positive integer number, is the state with being given, is the control vector, and are random system parameters. In the above system model, all the randomness are modeled by a completely filtrated probability space , where is the event set, is the filtration of the information available at time with , and is the probability measure. More specifically, at any time , the filtration is the smallest sigma-algebra generated by the realizations of and . That is to say, in our model, the random parameters and are measurable for any .111The random structure we adopt in this paper has been commonly used in the area of financial engineering [29]. This kind of models is very general, since it does not need to specify the particular stochastic processes which and follow. To simplify the notation, we use to denote the conditional expectation with respect to filtration . To guarantee the well-posedness of the model, we assume all and are square integrable, i.e., and for all , , .

Note that the above stochastic dynamic system model is very general. For example, it covers the traditional stochastic uncertain systems with a scalar state space and multiplicative noise [3]222In [3] and many related literatures, the stochastic dynamic system is modeled as , where are i.i.d random variables for different with zero mean and and , if .. It also covers the cases in which are serially correlated stochastic processes such as Markov Chain models [10][30] or the conventional time series models, which have important applications in financial decision making [1][31]. As for the control variables , they are required to be -measurable, i.e., the control at time only depends on the information available up to time . Furthermore, motivated by some real applications, we consider the following general control constraint set,

 Ut(xt)={ ut|ut is Ft-% measurable, Htut≤dt|xt|}, (2)

for , where and are deterministic matrices and vectors333Since both and are random variables (except ) for all , the inequalities given in (2) should be held almost surely, i.e., the inequalities are held for the cases with non-zero probability measure. To simplify the notation, we do not write out the term ‘almost surely’ explicitly in this paper.. Note that set (2) enables us to model various control constraints as evidenced from the following:

• the nonnegativity (or nonpositivity) constraint case, (or ) by setting (or ) and in (2);

• the constrained case with state-dependent upper and lower bounds, for some and by setting and ;

• the general cone constraint case, , for some ;

• unconstrained case, , by setting and .

To model the cost function, we introduce the following deterministic parameters, , and , which can be further written in more compact forms, for , , and . Overall, we are interested in the following class of inequality constrained stochastic LQ control problem (ICLQ),

 (PTLQ) (3) s.t. {xt,ut} satisfies (???) and (% ???) for t=0,⋯,T−1.

To solve problem , we need the following assumption.

###### Assumption 1.

for , and for all .

Assumption 1 guarantees the convexity of problem . Assumption 1 can be regarded as a generalization of the one widely used in the mean-variance portfolio selection, which requires (Please see, for example, [1] for detailed discussion). Also Assumption 1 is looser than the one used in [32], which requires path-wise positiveness of the random matrix. Note that, since , Assumption 1 implies for , , .

### Ii-B Problem with Infinite Time Horizon

We are also interested in a variant of problem with infinite time horizon. More specifically, we want to investigate the stationary control policy and long-time performance for infinite time horizon. In such an infinite time horizon, we assume that all the random parameters, and , are independent and identically distributed (i.i.d) over different time periods. Thus, we drop the index of time and simply use the random variable and random vector to denote the random parameters, which leads to the following simplified version of dynamic system (1),

 xt+1=Axt+But,   t=0,1,⋯,T−1, (4)

where the system parameters and are random with known joint distribution. As for the constraint (2), we also assume that all and are fixed at and , respectively, which leads to the following simplified version of constraint (2),

 Ut(xt)={ut | ut∈Rn , Hut≤d|xt|}, (5)

for , , , . To guarantee the feasibility of the constraint, we impose the following assumption.

###### Assumption 2.

The set is nonempty.

Note that is independent of and Assumption 2 implies that the feasible set is nonempty for any . We also set all penalty matrices , , , , at . We consider now the following ICLQ problem with infinite time horizon,

 (P∞LQ) min{ut}|∞t=0 E[∞∑t=0(utxt)′C(utxt)] s.t. {xt,ut}~{}satisfies (???% ) and (???) for t=0,⋯,∞.

Note that the expectation in problem is an unconditional expectation, since are independent over time.444For model , since are independent over time, we just simplify the notation to . For problem , we need to strengthen Assumption 1 by requiring to be positive definite as follows,

and

## Iii Solution scheme for problem (PTLQ)

In this section, we first reveal an important result of state separation for our models and then develop the solution for problem .

### Iii-a State Separation Theorem

To derive the explicit solution of problem , we first introduce the following sets associated with the control constraint set , , for . For problem , we further introduce three auxiliary optimization problems, , and for as follows,

 (Pt) minu∈Ut gt(u,x,y,z), (6) (^Pt) minK∈Kt ^gt(K,y,z), (¯Pt) minK∈Kt ¯gt(K,y,z),

where is -measurable random variable, and are -measurable random variables and , and , are respectively defined as

 ×(y1{Atx+Btu≥0}+z1{Atx+Btu<0})], (7) ×(y1{At+BtK≥0}+z1{At+BtK<0})], (8) ×(y1{At−BtK≤0}+z1{At−BtK>0})]. (9)

Since , it always holds true that , and . Before we present the main result, we present the following lemma.

###### Lemma 1.

The function is convex with respect to , and both and are convex functions with respect to .

###### Proof.

Checking the gradient and Hessian matrix of function with respect to gives rise to555In the following part, we need to compute the partial derivative of function with respect to . Under some mild conditions, we can always compute the derivative by taking the derivative of inside the expectation first. The technical condition guarantees this exchangeability of differentiation and expectation can be found in Theorem 1.21 of [33].

 × (y1{Atx+Btu≥0}+z1{Atx+Btu<0})] (10)

and

 ∇2ugt(u,x,y,z)=2Et[(Rt+yB′tBt)1{Atx+Btu≥0} =Et[Rt+(y1{y≤z}+y1{y>z})B′tBt1{Atx+Btu≥0} +(z1{y≤z}+z1{y>z})B′tBt1{Atx+Btu<0}]. (11)

Note that the following two inequalities always hold,

 y1{y>z}>z1{y>z},z1{y≤z}≥y1{y≤z}. (12)

Using the above inequalities and noticing that and , we can rearrange the terms of (11) to reach the conclusion of , where

 Mt=Rt+(y1{y≤z}+z1{y>z})B′tBt. (13)

Assumption 1 implies and . Moreover, the term is a positive random variable. These conditions guarantee , which further implies . That is to say, is a strictly convex function of . As we can also apply the same procedure to and to prove their convexity with respect to , we omit the detailed proofs here. ∎

One immediate implication of Lemma 1 is that all , and are convex optimization problems, as their objective functions are convex and their constraints are linear with respect to the decision variables. We can see that problem depends on random state , while problems and do not. That is to say, problems and can be solved off-line once we are given the specific description of stochastic processes of . Furthermore, these two convex optimization problems can be solved efficiently by existing modern numerical methods. The following result illustrates the relationship among problems , and , which plays an important role in developing the explicit solution for problem .

###### Theorem 1.

For any , the optimal solution for problem is

 (14)

where and are defined respectively as

 ^K ¯K

and the optimal objective value is

 v(Pt)=x2(^gt(^K,y,z)1{x≥0}+¯gt(¯K,y,z)1{x<0}). (15)
###### Proof:

Since problem is convex, the first-order optimality condition is sufficient to determine the optimal solution (see, e.g., Theorem 27.4 in [34]). If is the optimal solution, it should satisfy

 ∇ugt(u∗,x,y,z)′(u−u∗)≥0, for ∀ u∈Ut, (16)

where is given in (10). Note that the condition (16) depends on state . Thus, we consider the following three different cases.

(i) We first consider the case of . Let be the optimal solution of problem , which satisfies the following first-order optimality condition,

 ∇K^gt(^K,y,z)′(K−^K)≥0, for ∀ K∈Kt, (17)

where is defined as

 ∇K^gt(^K,y,z)=2Et[(Rt^K+St)+B′t(At+Bt^K) ×(y1{At+Bt^K≥0}+z1{At+Bt^K<0})]. (18)

If we let , it is not hard to verify that satisfies both the constraint and the first-order optimality condition of by substituting back to (16) and using the condition (17). That is to say, solves problem when . Substituting back to gives the optimal objective value of () as .

(ii) For the case of , we consider the first-order optimality condition of problem ,

 ∇K¯gt(¯K,y,z)′(Kt−¯Kt)≥0,for ∀ Kt∈Kt,

where is defined as,

 ∇K¯gt(¯K,y,z)=2Et[(Rt¯K+St)+B′t(At−Bt¯K) ×(y1{At−B′t¯K≤0}+z1{At−Bt¯K>0})].

Similarly, let . We can verify that satisfies both the constraint and the optimality condition (16) of problem . Thus, solves problem for with the optimal objective value being .

(iii) When , the objective function (7) of becomes

 gt(u,0,y,z) =Et[u′Rtu+u′B′tBtu(y1{Btu≥0} +z1{Btu<0})]. (19)

From the inequalities in (12), we have

where is defined in (13). Note that (19) is bounded from below by

 gt(u,x,y,z)|x=0≥u′Et[Mt]u≥0.

As we have showed , only when . Clearly, also satisfies the constraint . That is to say, solves problem when . As a summary of the above three cases, the optimal solution of problem () can be expressed as (14) and the optimal objective value is given as (15). ∎

### Iii-B Explicit solution for problem (PTLQ)

With the help of Theorem 1, we can develop the explicit solution for problem . We first introduce the following two random sequences, , , , and , , , , which are defined backward recursively as follows,

 ^Gt :=minKt∈Kt ^gt(Kt,^Gt+1,¯Gt+1), (20) ¯Gt :=minKt∈Kt ¯gt(Kt,^Gt+1,¯Gt+1), (21)

where and are defined respectively in (8) and (9) for with the boundary conditions of . Clearly, and are -measurable random variables.

###### Theorem 2.

The optimal control policy of problem at time is a linear feedback policy£¬

 (22)

where and are defined as,

 ^K∗t =argminKt∈Kt ^gt(Kt,^Gt+1,¯Gt+1), ¯K∗t =argminKt∈Kt ¯gt(Kt,^Gt+1,¯Gt+1),

where and are given in (20) and (21), respectively, and and for . Furthermore, the optimal objective value of problem is

 v(PTLQ)=x20(^G01{x0≥0}+¯G01{x0<0}). (23)
###### Proof:

We prove this theorem by invoking dynamic programming. At any time , the value function of problem is defined as

 Vt(xt) s.t. {xk,uk} satisfies (???) and (% ???) for k=t,⋯,T−1.

From the Bellmen’s principle of optimality, the value function satisfies the reversion,

 Vt(xt) =minut∈Ut(utxt)′Ct(utxt) +Et[Vt+1(xt+1)]. (24)

By using the mathematical induction, we show that the following claim is true,

 Vt(xt)=x2t(^Gt1{xt≥0}+¯Gt1{xt<0}) (25)

for , where and satisfy the recursions, respectively, in (20) and (21). Clearly, at time , the value function is

 VT(xT)=qTx2T.

As we have defined , the claim (25) is true. Now, we assume that the claim (25) is true at time ,

 Vt+1(xt+1)=x2t+1(^Gt+11{xt+1≥0}+¯Gt+11{xt+1<0}).

From (24) and the definition of in (7) at time , we have

 Vt(xt)=minut∈Ut gt(ut,xt,^Gt+1,¯Gt+1),

which is just the same problem as () given in (6) by replacing and with and , respectively. From Theorem 1, we have the result in (22) and the claim in (25) is true at time , by defining and , which completes the proof. ∎

The above theorem indicates that the system of equations (20) and (21) play the same role as the Riccati Equation for the classical LQG optimal problem. In the following part, we name the pair of (20) and (21) as the Extended Riccati Equations. Furthermore, when there is no control constraint, these two equations will merge to the one equation in (26) presented later in Section III-C.

In Theorem 2, the key step in identifying is to solve (20) and (21) for and , respectively, for each . Generally speaking, once the stochastic process of and is specified, we can employ the following procedure to compute and :

Note that, if and are statistically independent over time, all the conditional expectations in and would degenerate to the unconditional expectations, which generates the deterministic pairs of , and , . However, if or are serially correlated over time, all the conditional expectations depend on the filtration , or in other words, all these pairs are also random variables. Under such a case, usually, we need numerical methods to discreterize the sample space and solve both problems, (20) and (21), for each sample path.

### Iii-C Solution for problem (PTLQ) with no control constraints

If there is no control constraint in , i.e., , Theorem 2 can be simplified to: for all , , , solving problems and yields and , respectively. Let and , for , , . Thus, we have the following explicit control policy.

###### Proposition 1.

When there is no control constraint in problem (), the optimal control becomes

 u∗t=−(Rt+Et[Gt+1B′tBt])−1(Et[Gt+1AtB′t]+St)xt,

for , where is defined by

 Gt =qt+Et[Gt+1A2t]−(St+E% t[Gt+1AtB′t])′ (26)

with .

Proposition 1 is a generalization of Corollary 5 in [1], which solves the dynamic mean-variance portfolio control problem with correlated returns.

## Iv Optimal Solution to Problem (P∞LQ)

In this section, we develop the optimal solution for . The main idea is to study the asymptotic behavior of the solution from the correspondent finite-horizon problem by extending the time horizon to infinity. More specifically, we consider the following problem with finite horizon,

 (AT) min{ut}|T−1t=0 E[  T−1∑t=0(utxt)′C(utxt)] s.t. {xt,ut}  satisfies (???) and (???) for t=0,⋯,T−1,

where is given. Obviously, becomes when