Portfolio choice, portfolio liquidation, and portfolio transition under drift uncertainty.^{†}^{†}thanks: This research has been conducted with the support of the Research Initiative “Modélisation des marchés actions et dérivés” financed by HSBC France under the aegis of the Europlace Institute of Finance. The authors would like to thank Rama Cont (Imperial College), Nicolas Grandchamp des Raux (HSBC France), CharlesAlbert Lehalle (CFM and Imperial College), JeanMichel Lasry (Institut Louis Bachelier), and Christopher Ulph (HSBC London) for the conversations they had on the subject.
Abstract
This paper presents several models addressing optimal portfolio choice and optimal portfolio transition issues, in which the expected returns of risky assets are unknown. Our approach is based on a coupling between Bayesian learning and dynamic programming techniques. It permits to recover the wellknown results of Karatzas and Zhao in the case of conjugate (Gaussian) priors for the drift distribution, but also to go beyond the nofriction case, when martingale methods are no longer available. In particular, we address optimal portfolio choice in a framework à la AlmgrenChriss and we build therefore a model in which the agent takes into account in his/her allocation decision process both the liquidity of assets and the uncertainty with respect to their expected returns. We also address optimal portfolio liquidation and optimal portfolio transition problems.
Key words: Optimal portfolio choice, Bayesian learning, Stochastic optimal control, HamiltonJacobiBellman equations, Optimal portfolio transition.
1 Introduction
The theory of portfolio selection started in 1952 with the seminal
paper [23] of Markowitz.^{1}^{1}1Markowitz was awarded the Nobel Prize in 1990 for his work. For a
brief history of portfolio theory, see [24]. In this paper, Markowitz considered the problem of an agent who
wishes to build a portfolio with the maximum possible level of expected
return, given a limit level of variance. He then coined the concept
of efficient portfolio and described how to find/compute such portfolios.
Markowitz paved the way for studying theoretically the optimal portfolio
choice of riskaverse agents. A few years after Markowitz’s
paper, Tobin published indeed his famous research work on agents’
liquidity preferences and the separation theorem (see [33]), which is based
on the ideas developed by Markowitz. A few years later, in the sixties,
Treynor, Sharpe, Lintner, and Mossin introduced independently the
Capital Asset Pricing Model (CAPM) which is also built on top of Markowitz’s
ideas. The ubiquitous notions of and owe a lot
therefore to Markowitz’s modern portfolio theory.
Although initially written within a meanvariance optimization framework,
the socalled Markowitz’s problem can also be written within the Von
NeumannMorgenstern expected utility framework. This was for instance
done by Samuelson and Merton (see [25, 26, 31]),
who, in addition, generalized Markowitz’s problem by extending the
initial oneperiod approach to a multiperiod one. Samuelson did it
in discrete time, whereas Merton did it in continuous time. It is
noteworthy that they both embedded the intertemporal portfolio choice
problem into a more general optimal investment/consumption problem.^{2}^{2}2This problem in continuous time is now referred to as Merton’s problem.
In [25], Merton used PDE techniques in order
to characterize the optimal consumption process of an agent and its
optimal portfolio choices. In particular, Merton managed to find closedform
solutions in the constant absolute risk aversion case (i.e., for exponential
utility functions), and in the constant relative risk aversion case
(i.e., for power and log utility functions). Merton’s problem has
then been extended to incorporate several features such as transaction
costs (proportional and fixed), or bankruptcy considerations. Major
advances to solve the Merton’s problem in full generality have been
made in the eighties by Karatzas et al. by using (dual) martingale
methods. In [18], Karatzas, Lehoczky, and Shreve
used a martingale method to solve Merton’s problem for almost any
smooth utility function (under the nobankruptcy constraint) and showed
how to partially disentangle the consumption maximization problem
and the terminal wealth maximization problem. Constrained problems
and extensions to incomplete markets were then considered –
see for instance the paper [9] by Cvitanić
and Karatzas.
In the literature on portfolio selection or in the slightly more general
literature on Merton’s problem, input parameters (for instance the
expected returns of risky assets) are often considered known constants,
or stochastic processes with known initial values and dynamics. In
practice however, one cannot state for sure that price returns will
follow a given distribution. Uncertainty on model parameters is the
raison d’être of the celebrated BlackLitterman [6]
model, which is built on top of Markowitz’s model and the CAPM. However,
as Markowitz’s model, BlackLitterman model is a static one. Subsequently,
the agent of BlackLitterman model does not use what he/she might
learn on the distribution of asset returns from their realizations.
Generalizations of optimal allocation models (or models addressing
Merton’s problem) involving filtering and learning techniques in a partial information framework have been proposed.
In the optimal portfolio choice literature, the most important paper
mixing optimization and learning techniques is certainly the paper
of Karatzas and Zhao [20]. In a model where
the asset returns are Gaussian with unknown mean, they used martingale
methods under the filtration of observables to compute, for almost
any utility function, the optimal portfolio allocation (there is no
consumption in their model). They also showed that their martingale
method could be used for solving a MongeAmpèrelike parabolic PDE
which naturally arises in their model. The same martingale (or dual) method has then been used to solve similar optimization problems with partial information – see for instance [10, 21, 22, 30].^{3}^{3}3A similar changeofmeasure type of argument is also used in [5].
More general models have also been proposed where the dynamics of the drift is related to a hidden Markov chain / regimeswitching model. Rieder and Bäuerle proposed for instance in [28] a model with one risky asset where the drift is modeled by a hidden
Markov chain (see [17] and [32] for other papers on a similar topic). An important point related to [28] is that the authors used HJB equations and not the martingale approach. By solving the PDEs, they obtained closedform solutions in the case of power and log utility functions, and recovered the results of [20] in the pure Bayesian case.
Only a few models in the partialinformation literature are indeed solved by using PDE techniques. An important instance is Brendle [7]. He considered the optimal portfolio choice of an agent who does not know the drift of risky assets but knows that these drifts follow OrnsteinUhlenbeck processes with known parameters. The HamiltonJacobiBellman (HJB) equation associated
with the control problem is reduced to a set of nonlinear ODEs that are solved in closed form for CRRA and CARA utility functions, but only in the case of 1 risky asset – see also Rishel [29].^{4}^{4}4While publishing this paper, we noticed another very recent paper, by Casgrain and Jaimungal, dealing with optimal execution in a partialinformation framework and using PDEs to solve the optimization/learning problem (see [8]).
In this paper, we consider several problems of portfolio choice in
continuous time in which the (constant) expected returns of the risky
assets are unknown. We first consider a problem similar to the one
tackled by Karatzas and Zhao in the specific case where the Bayesian
prior for the expected returns is a conjugate prior (Gaussian in our
case). Our approach is based on the fact that conjugate priors are
associated with simple Markovian updates of the distribution parameters.
By adding into the state space the parameter(s) of the prior distribution,
we show that classical ideas from dynamic programming / stochastic
optimal control can be used and that the HJB equation associated with
the problem can be solved in closed form in the case of CARA (i.e.,
exponential) and CRRA (i.e., power and log) utility functions, in the general case of risky assets (unlike [7] which only provides closedform solutions in the case of one risky asset). In
particular, solving the HJB equation boils down to solving a simple
linear differential equation of order in the CARA case, and a
Riccati equation (for which we have solutions in closed form) in the
CRRA case. Furthermore, unlike most of the papers in the literature on portfolio choice with partial information, we provide verification theorems. This is particularly important as uncertainty sometimes leads to explosion in the value function when the utility function is not concave enough.
The PDE approach permits to avoid the annoying computations needed
to simplify the general expressions of Karatzas and Zhao in the specific
case of conjugate priors, but our message is, of course, not limited to that.
The PDE approach
can indeed be used in situations where the (dual) martingale approach
cannot be used. For instance, we use our approach to solve the optimal
allocation problem in a trading framework à la AlmgrenChriss
with quadratic execution costs. The AlmgrenChriss framework was initially
built for solving optimal execution problems [1, 2],
but it is also very useful outside of the cash equity world. For instance,
Almgren and Li [3], and Guéant and Pu [14]
used it for the pricing and hedging of vanilla options when liquidity
matters.^{5}^{5}5Guéant et al. also used the AlmgrenChriss framework to tackle
the pricing, hedging and execution issues related to Accelerated Share
Repurchase contracts – see [12, 15]. The model we propose is one of the first models using the AlmgrenChriss
framework to address an asset management problem, and definitely the
first paper in this area in which the AlmgrenChriss framework is
used in combination with Bayesian learning techniques.^{6}^{6}6Almgren and Lorenz used Bayesian techniques in optimal execution (see
[4]), but they considered myopic agents with
respect to learning. We also show how our framework can be slightly modified in order
to address optimal portfolio transition issues.
Conjugate priors lead to very powerful Bayesian learning techniques,
and this paper aims at proving that Bayesian ideas combined with stochastic
optimal control can be very efficient to address a lot of financial
problems. It is important to understand that Bayesian learning is
a forward process whereas dynamic programming relies classically on
backward induction reasonings. By using these two classical tools
at the same time, we do not only benefit from the power of Bayesian
techniques to learn continuously the value of unknown parameters,
but we develop a framework in which agents learn and make decisions
knowing that they will go on learning in the future in the same manner
as they have learnt in the past. The same ideas are for instance at
play in the case of Bayesian (Bernoulli) multiarmed bandits where the unknown
parameters are the parameters of Bernoulli distributions –
with Beta prior distributions –, but the dimensionality
of the problem often makes computations based on PDEs too computerintensive.^{7}^{7}7Upper confidence bound methods or Thompson sampling are often preferred
to dynamic programming for computing (in fact approximating) the optimal
strategies in this area. Another example is in the domain of media buying: in the paper by
FernandezTapia et al. [11] the unknown parameters
of exponential and Bernoulli distributions have respectively Gamma
and Beta prior distributions.
In Section 2, we consider the allocation problem of an agent in a context with only one risky asset – in addition to the riskfree asset – and we solve it in the case of a CARA utility function and in the case of a CRRA utility function. We analyze in particular the role of learning (and the influence of the knowledge that one will go on learning in the future) on the allocation process of the agent. In Section 3, we generalize the model to the case of several risky assets. In Section 4, we introduce liquidity costs through a modelling framework à la AlmgrenChriss and we use Bayesian learning for portfolio choice, optimal portfolio liquidation, and optimal portfolio transition problems.
2 Optimal portfolio choice with one risky asset
2.1 Introduction: price dynamics and Bayesian learning
2.1.1 Price dynamics
In this section, we consider an agent facing a portfolio allocation problem in a simplified financial context with one riskfree asset and one risky asset.
Let
be a filtered probability space, with
satisfying the usual conditions. Let
be a Wiener process adapted to .
The riskfree interest rate is denoted by . The risky asset has the following classical lognormal dynamics
(1) 
but we assume that the drift is unknown.
Remark 1.
Both and are unobserved by the agent, but is observed at time .
2.1.2 Bayesian updates
At time , the agent’s belief about the value of is modeled by a Gaussian prior distribution^{8}^{8}8We assume that is independent of .
The evolution of the risky asset price reveals information to the agent about the true value of the drift . In what follows we denote by the filtration generated by .
Remark 2.
is not an Brownian motion, because it is not adapted.
A classical result of the literature on filtering methods states that the conditional distribution of given (for any ) is Gaussian. More precisely, we have:
Proposition 1.
Let . Given , is conditionally normally distributed, with mean and variance , where
and
Proof.
For , is a Gaussian vector with variance matrix
where
and
The distribution of given is the distribution of given . It is Gaussian with
and
But
Therefore,
and
By a monotone class argument, we have therefore that, for , the distribution of given is Gaussian with mean
and variance
∎
2.1.3 Introduction of a new Brownian motion
Proposition 1 defines two processes and . The latter is a deterministic process^{9}^{9}9This process is decreasing because the longer we observe the less uncertainty remains on the value of . whereas the former is a stochastic process with the following dynamics:
where the function is defined by
and the process is defined by
The following proposition states that is a Brownian motion adapted to the filtration associated with the price process.
Proposition 2.
is a Wiener process adapted to .
Proof.
For proving this result, we use the Lévy’s characterization of a Brownian motion.
Let . By definition, we have
hence the measurability of .
Let , with .
For the first term, the increment is independent of and independent of . Therefore, it is independent of and we have
Regarding the second term, we have
by definition of .
We obtain that is an martingale.
Since has continuous paths and , we conclude that is an Brownian motion. ∎
2.2 Optimal portfolio choice in the CARA case
2.2.1 Portfolio dynamics and HJB equation
The strategy of the agent is described by a process , modelling the amount invested in the risky asset. The resulting value of the agent’s portfolio is modeled by a process with and the following dynamics:
For defining the set of admissible processes , let us first set a time horizon . Then, let us introduce the notion of “linear growth” for a process in our context.
Definition 1.
A measurable and adapted process is said to satisfy the linear growth condition if, for all ,
where is deterministic and depends only on .
We define the set of admissible strategies as where
For , given and , we define:
Let us now come to the optimization problem. We assume, in the CARA case considered here, that the agent maximizes
The value function associated with this problem is then defined by
The HJB equation associated with this problem writes
(2) 
with terminal condition
(3) 
2.2.2 Solving the HJB equation
In this paper, we do not use viscosity techniques and we rather use verification
arguments. In particular, we solve in closed form the HJB equation (2) with terminal condition (3).
To solve the HJB equation, we use the following ansatz:
(4) 
Proposition 3.
Suppose there exists satisfying
(5) 
with terminal condition
(6) 
Then defined by (4) is solution of the HJB equation (2) with terminal condition (3).
Moreover, the supremum in (2) is achieved at:
(7) 
Proof.
Now,
The supremum is reached at
and
By using this expression, we obtain:
by definition of .
As far as the terminal condition is concerned, it is straightforward to verify that satisfies the terminal condition (3). ∎
We have transformed the initial threevariable nonlinear PDE into a twovariable linear one. In the following proposition, we show that solving (5) with terminal condition (6) boils down to solving a triangular system of first order linear ODEs.
Proposition 4.
Proof.
Let us consider a couple solution of (8) with terminal condition (9). If is defined by (10), then
by definition of and .
As far as the terminal condition is concerned, it is straightforward to verify that satisfies the terminal condition (6). ∎
The triangular system of ODEs (8) with terminal condition (9) can be solved very easily. The next proposition states the expression of the unique solution .
Proposition 5.
From these propositions, we deduce:
2.2.3 Verification theorem
We now need to prove that the function exhibited in Corollary 1 is indeed the value function associated with the stochastic optimal control of Section 2.2.1. To obtain a verification theorem, we need first a very simple lemma.
Lemma 1.
The process satisfies the linear growth condition.
Proof.
We have
∎
We will also use in what follows the classical result of Beneš (see [19]) that we recall now for the sake of completeness.
Theorem (Beneš’s theorem).
If is a adapted process that satisfies the linear growth condition on , then the DoléansDade exponential is an martingale on .
We can now turn to the verification theorem stating that and giving the optimal investment strategy.
Theorem 1.

For all and , we have:
(12) 
Equality in (12) is obtained by taking the optimal control given by:
Proof.

Let . We apply Itō’s lemma to the process :
(13) where the linear operator and the function are respectively given by:
and
We define
We compute^{10}^{10}10For readability’s sake, we will shorten the notations, remembering that all the functions are evaluated at . the increment of :
Then we compute the increment of :
By definition of , we have , and is therefore nonincreasing. In particular, we have for ,
Now, satisfies the linear growth condition by definition of the set of admissible strategies. Moreover, by Lemma 1, and also satisfy the linear growth condition. Therefore, we can apply Beneš’s theorem and obtain that is a true martingale. In particular we have for all .
By taking the conditional expectation given and setting , we obtain:

By taking , we have, by definition of and ,