Mean-Field Games with Differing Beliefs for Algorithmic TradingSJ would like to acknowledge the support of the Natural Sciences and Engineering Research Council of Canada (NSERC), [funding reference numbers RGPIN-2018-05705 and RGPAS-2018-522715] Work in Progress

Mean-Field Games with Differing Beliefs for Algorithmic Tradingthanks: SJ would like to acknowledge the support of the Natural Sciences and Engineering Research Council of Canada (NSERC), [funding reference numbers RGPIN-2018-05705 and RGPAS-2018-522715]
Work in Progress

Philippe Casgrain Department of Statistical Sciences, University of Toronto, Canada ().    Sebastian Jaimungal Department of Statistical Sciences, University of Toronto, Canada (,

Even when confronted with the same data, agents often disagree on a model of the real-world. Here, we address the question of how interacting heterogenous agents, who disagree on what model the real-world follows, optimize their trading actions. The market has latent factors that drive prices, and agents account for the permanent impact they have on prices. This leads to a large stochastic game, where each agents’ performance criteria is computed under a different probability measure. We analyse the mean-field game (MFG) limit of the stochastic game and show that the Nash equilibria is given by the solution to a non-standard vector-valued forward-backward stochastic differential equation. Under some mild assumptions, we construct the solution in terms of expectations of the filtered states. We prove the MFG strategy forms an -Nash equilibrium for the finite player game. Lastly, we present a least-squares Monte Carlo based algorithm for computing the optimal control and illustrate the results through simulation in market where agents disagree on the model.


assumptionAssumption \headersMFG with Differing Beliefs for Algo TradingCasgrain, P. and Jaimungal, S.

1 Introduction

Financial markets are immensely complicated dynamic systems which incorporate the interactions of millions of individuals on a daily basis. Market participants vary immensely, both in terms of their trading objectives and in their beliefs on the assets they are trading. All of these participants compete with one another in an attempt to achieve their own personal objectives in the most efficient way possible. Traded assets may also be driven by latent factors, and agents must dynamically incorporate data into their trading decisions.

In this paper, we propose a game theoretic model in which a large population of heterogeneous agents all trade the same asset. This model considers heterogeneity not only from the point of view of an individual’s trading objectives and risk appetite, but also from the point of view of each agent’s beliefs regarding the performance of the asset they are trading. We pay particular attention to the information each agent is privy to, in an attempt to render the framework as realistic as possible, while maintaining some semblance of tractability.

We study the equilibrium of these markets by using the theory of mean-field games (MFGs), which serves to describe Game-Theoretic models as the number of participating agents becomes extremely large. The general theory of mean-field games already has a large body of research associated with it. The original works stem from [16], [15], and [19]. Among the many extensions and generalizations which explore the broad theory of MFGs as well as their applications, we highlight the following works: [14], [13], [21], and [3]. This theory has seen applications in various financial contexts, such as [4] and [17] who use it to model systemic risk, [18] show use it for algorithmic trading in the presence of a major agent and a population of minor agents, [2] who investigate MFG in the context of optimal execution, and [9] who look at mean-field games in algorithmic trading with partial information on states.

In contrast to other work on MFGs, as well as its specific application to algorithmic trading, here, motivated by [6], we include latent states so that agents do not have full information about the system dynamics. In contrast to [7], who also study a stochastic game with latent factors, we study how varying beliefs among the agents affect the optimal trading behaviour. In our model, we express the belief of agents as a probability measure on the dynamics of the asset price process and of any latent processes that may be driving them. As far as the authors are aware, this is the first time that MFG with varying beliefs have been treated in the literature. This generalization is quite non-trivial, nonetheless, we succeed in characterizing the model equilibrium as the solution to a non-standard forward-backward stochastic differential equation (FBSDE) defined across the collection of belief measures. We are able to present a closed form representation for the solution of the MFG and it incorporates all of the differing market’s beliefs into the decisions of the individual agents.

We structure the remainder of the paper as follows. Section 2 introduces the market model and the stochastic game that agents participate in. Section 3 begins by introducing the MFG limit of the stochastic game and then proves that the collection of optimal strategies in the MFG may be represented as the solution to a system of coupled FBSDEs. Next, the system of FBSDEs is solved, and the solution to the mean field as well as each individual’s strategy is provided. Section 4 provides a specific example of a model where the assumptions in the key results are satisfied. In Section 5 we show that the solution to the MFG satisfies the -Nash equilibrium property in the finite population game. Lastly, Section 7 provides a least-square Monte Carlo approach to computing certain expecations, as well as simulated examples of a market model with agents having differing beliefs.

2 The Model

In this section, we provide the market model and the participating agents performance criteria. The model presented in the remainder of this section closely resembles the model for the stochastic game used in [7]. The stochastic game presented here aims to characterize a population of agents with several sources of heterogeneity. As in [7], here, agents have varying trading objectives. In addition, however, agents are also characterized by their beliefs regarding the model driving the asset price process. In the remainder of this section, we present the trading mechanics which each of the agents use to interact with the market, as well as the objectives each of the agents seek to achieve with their actions.

2.1 The Population of Agents

The market consists of a population of rational heterogeneous agents trading a single asset. Agents are indexed with an integer . The total population of agents is divided into disjoint sub-populations, which are indexed by . is assumed to be constant and independent of . All agents within a fixed sub-population behave in a homogeneous manner. The set


denotes the set of agents within sub-population , and the superscript indicates the explicit dependence on the total number of agents. We also define to be the total number of agents within sub-population . We further assume the number of agents contained in each of the sub-populations remains stable as we take the population limit to infinity. More specifically, we require that the proportion of agents contained within population satisfies


2.2 The Agent’s State Processes

We work on the filtered probability space completed by the null sets of and where is some fixed time horizon. All of processes defined in the remainder of this section are -adapted, unless otherwise specified, and the notation represents expectation with respect to the measure .

All agents have the ability to buy and sell the asset over the fixed trading period , after which all trading activity comes to a halt. Each agent controls the amount they wish to purchase or sell at a continuous rate denoted , where () indicates the rate of buy (sell) orders the agent sends to the market. At the start of the trading period, each agent is assumed to hold a random amount of the asset. Agents keep track of their holdings in the traded asset with the inventory process , where the superscript indicates the explicit dependence on the agent’s controlled rate of trade. The relationship between agent-’s trading rate and their inventory process is


This can be interpreted as each agent buying or selling an amount in each small time interval . {assumption} We make the technical assumption that the initial inventory holdings of all agents have a bounded variance, so that for which . Moreover, we assume that the mean of the starting inventory levels are the same within a given sub-population, so that for each .

Buying and selling actions of agents impact the price of the traded asset in a manner to be specified below. As well, agents believe the asset midprice follows (potentially) different models. We incorporate differing beliefs into our model by assigning a probability measure to each sub-population . The various measures correspond to the model that agents in a particular sub-population believes to represent the true dynamics of the asset price.

We define the asset price process , where the superscript indicates the dependence of the price on the actions of all agents in the market. It is useful to define the average trading rate of all agents within sub-population as


Each agent in sub-population then believes the asset price process follows the dynamics


where for each , is a -predictable process, is a -adapted -martingale, and are constants. We also assume here that the initial inventory holdings of each agent are all independent of both and in each measure .

The measure effectively specifies the sub-population-’s asset price model through the processes and , as well as the scale of the market impact of each sub-population, through set of constants . {assumption} Here, we make the technical assumptions that and , where


for each and where represents the Euclidean norm. {assumption} We also make the assumption that for all and the law under each measure is the same as that under the measure . Lastly, we assume that for each , and are uncontrolled – i.e., are unaffected by the agents’ actions.

Each agent tracks their total accumulated cash process throughout the trading period. When buying and selling the asset, each agent pays an instantaneous cost that is linearly proportional to amount of shares transacted. This cost is expressed through the controlled dynamics of the cash process. For an agent , their corresponding cash process is


where is a parameter that is unique to a sub-population and sets the scale of the instantaneous cost.

2.3 Information Restriction

In this market model, agents have restricted information over the course of the trading period. More specifically, agents have access only to the information generated by the paths of the asset price process , their own inventory process , and the average order flow of each sub-population, . We express this information restriction in our model by restricting the sigma-algebra to which an agent’s strategy may be adapted. For each , we only allow agent- to choose strategies contained within the set of asmissible strategies,


where we define , and


which is the sigma-algebra generated by the paths of the asset price process, the total order-flow proces,, and the starting inventory level for agent . In definition (9), we deliberately restrict ourselves to processes in , to guarantee that for all .

2.4 The Agent’s Optimization Problem

Each agent chooses their trading strategy to maximize an objective functional that measures their performance over the course of the trading period . For each let . Each agent- within a sub-population , chooses a control to maximize a functional defined as follows


where and are parameters that vary across, but are constant within, sub-populations. In definition (11), we use the notation to indicate the dependence of the objective functional on the actions of all other agents in the population.

The objective functional corresponds to the agent trying to maximize a weighted average of three separate quantities. The first term corresponds to the total amount of cash the agent has accumulated up until time . The second term, corresponds to the cost of liquidating all of the agent’s leftover inventory at time , minus a liquidation penalty controlled by the parameter . The last term, is a running risk-aversion penalty that is controlled by the parameter , which incentivizes the agent to keep their market exposure low during the trading period. It may also be interpreted as stemming from model uncertainty as shown in [5].

Each agent within sub-population has an objective functional that is computed by taking expectations under the measure . Hence, agents incorporate their own beliefs on the asset price dynamics. Furthermore, each functional depends on the actions of all other players () through the dynamics of the asset price , which implicitly appear in the definition (11).

By expanding the dynamics of each of the state processes present in (11), and by using integration by parts, we may re-write the agent’s objective functional as


where is a term that is constant with respect to and . Each agent’s behaviour is characterized entirely by the objective functional they are trying to maximize. From (12), it is clear that the objective functional is parametric so that the agent’s preferences can be entirely described by the tuple and their starting inventory .

The market model defined above forms a stochastic game in which all participating agents are competing to maximize each of their own objectives. We wish to find and study this market at its Nash equilibrium – i.e., where agents are simultaneously at their optimum. This equilibrium can be described more formally as the collection of admissible strategies which satisfies the condition


Obtaining this collection of strategies for the stochastic game with a finite number of players proves to be a difficult task. One of the main obstacles in finding a solution to this problem is that each agent’s strategy is adapted to different filtration . Furthermore, each of the objective functionals defined in equation (12) are expressed one of different measures from the collection of measures , each representing the beliefs of a particular individual. These two features make the finite-population stochastic game difficult to solve directly. It is, however, possible to solve the stochastic game in the infinite population limit, and use the result as an approximation for the finite population game.

3 Solving the Mean-Field Stochastic Game

As the stochastic game presented in Section 2.4 presents obstacles when aiming to solve it directly, we now take a different avenue. In this section, we study the stochastic game as the population limit tends towards infinity. The resulting limit is that of a stochastic Mean Field Game (MFG) that we can solve. Although we do not explicitly solve the finite player game presented in Section (2), by establishing an -Nash equilibrium property in Section 5, we show that the equilibrium solution obtained for the MFG provides an approximation to the finite population game, provided that the population size is large enough.

This section begins by taking the population limit as , to obtain new objective functionals for the agents resulting in a stochastic MFG. Next, using convex analysis methods, we characterize the Nash-equilibrium as the solution to a coupled system of FBSDEs. We then conclude by presenting a solution to this FBSDE problem, and thus an exact representation of each agent’s optimal control at the Nash-equilibrium.

3.1 The Limiting Mean-Field Game

Agent-’s objective functional (11) only depends on the population size through the dynamics of the mid-price process , which is given by the dynamics in equation (5). {assumption} To proceed, we assume that the limiting trading rate exists, in particular, there exist processes for such that and


where is the Lebesgue measure on the Borel sigma-algebra , and where is the canonical product measure of and . As each individual is -predictable, must be -predictable. Moreover, by our assumption that for each , the limit (14) also holds almost everywhere. From now on, we refer to each of the processes as the mean-field trading rate for sub-population-.

Using the assumption that for all along with (14), we find that in the infinite population limit, from the perspective of agent- from sub-population , the dynamics of the asset price process is


In this limit, a single individual’s impact on the price becomes negligible, thus the resulting mean-field trading rate is unaffected by a single agent’s trading rate . Therefore, in the limit, each agent’s objective no longer depends on the whole collection of trading rates , but instead only depends on the collection of mean-field processes . By using the objective functional representation in (12), expanding from (15), and noticing that the martingale components vanish under expectation, we may write the agents objective functional in the infinite population limit as


where for each we define as and where we define as . In the expression for in (16) we suppress the argument as, in this infinite population limit, their effect is felt through the mean-fields for each subpopulation. We use the superscript in the notation for to indicate the dependence on the set of mean-fields.

Our new objective is to obtain the Nash-equilibrium in this newly defined mean-field game. The Nash equilibrium for the MFG consists of finding the infinite collection of controls that satisfies the optimality condition


as well as the consistency condition


for all .

In the limit, the explicit dependence of an agent’s actions in another agent’s objective functional is replaced with an implicit dependence through the consistency condition.

3.2 The Agent’s Optimality Condition

To solve the optimization problem described in Section 3.1, we must determine what strateg maximizes the rhs of equation (17) for all agents. This can be achieved in our particular case by using tools from infinite dimensional convex-analysis or variational calculus. First, we demonstrate that each function is a strictly concave functional of . Next, as is a functional with an infinite-dimensional argument, we show that each functional is Gâteaux differentiable within the space and compute the Gâteaux derivative explicitly. General results in convex optimization then state that if the derivative vanishes at a point within the space , it must be the point at which attains its supremum. The lemmas that follow give us the required properties for .

Lemma 1

The functional defined in equation (16) is strictly concave in up to null sets.


See A.1.

Lemma 2

For an agent- in sub-population , the functional defined in equation (16) is everywhere Gâteaux differentiable in . The Gâteaux derivative at a point in a direction can be expressed as



See A.2.

Therefore, since is concave, the supremum of is attained at a point if and only if the expression (19) vanishes for all . Moreover, the strict concavity of guarantees that such a point is unique up to null sets. Indeed, as the following theorem shows, the collection of points that ensures (19) vanishes for all , and for all , coincides with the solution of an infinite-dimensional system of FBSDE.

Theorem 1

We have that


for all if and only if for each agent- in sub-population , and is the unique strong solution to the FBSDE


where is an -adapted -martingale and where


for all .


See A.3.

Theorem 1 reduces the convex optimization problem (16), (17), and (18) into an infinite system of FBSDEs. The forward component comes from the latent drift processes and inventory processes , while the backwards component comes from the trading rates . The coupling in this system appears through the mean-field processes , which averages out all of the actions of other agents within the game. A few difficulties are immediately apparent in the FBSDE (21). Firstly, each individual FBSDE, corresponding to a particular agent’s trading rate, is written in terms of a martingale that is specific to the agent’s sub-population, and the measure under which the process is a martingale corresponds to the agent’s belief about the drift process . Secondly, the conditional expected value appears in the driver of the FBSDE. This is a projection of the mean-fields onto the agent’s filtration, and appears because the agent cannot directly observe the strategies of other individuals. This projection of the mean-fields adds another layer of difficulty.

Recall that a solution to the FBSDE (21) for agent- consists of a pair of processes that satisfies the SDE and terminal condition in (21) almost everywhere. For the requirements of Theorem 1 to be met, a solution must simultaneously meet the consistency condition (22) almost everywhere. If we can find a set of solutions, we can guarantee it is unique up to null sets due to the strict convexity of the objective functional and the ‘if and only if’ nature of the statement.

3.3 Solving the Optimality FBSDE

In this section, we solve the FBSDE (21), and hence provide an exact form for the Nash-equilibrium for the infinite population mean-field game. The key to obtaining a solution lies in first postulating a structure for the solution of (21). This form then suggests a vector valued FBSDE that the mean-field processes must satisfy, which are independent of any individual agent’s strategy. The resulting non-standard FBSDE system, is defined across the set of measures and introduces an obstacle in solving it directly. The key step in obtaining a solution lies in representing the FBSDE in terms of a single measure, and solving it there.

Due to the linear form of the FBSDE (21), it is natural to assume that the solution is affine. As such, for an agent- within a sub-population , we seek for optimal controls of the form


where is an unknown deterministic, continuously differentiable, function of time, and where we define the mean-field inventory process for sub-population as

Plugging this ansatz into (21) and simplifying, we find that


along with the boundary condition that


which must both hold almost everywhere. Therefore, to solve the FBSDE (21), it is sufficient for us to make the terms in the curly brackets of equation (24) and in the boundary condition (3.3) vanish independently of one another. Collecting these equations, we obtain a first-order Riccati-type ODE for ,


as well as a linear FBSDE for the mean-field process


where is an -adapted -martingale.

Let us point out here that the ansatz for found in equation (23) satisfies the consistency condition as long as there exist solutions to the equations (26) and (27). This can be most easily seen by taking the average of (23) over and taking the limit as .

The FBSDE (27) implies that the solution should be an -adapted process. However, Equation (27) holds for any agent- for which , i.e., any agent within the same sub-population, therefore for each , should be -adapted for any . Consequently, letting , each is in fact an -adapted process which solves the FBSDE


where is an -adapted, -martingale, and the expectation appearing in the drift is conditional on not .

By stacking the FBSDEs (28) over all values of , we may obtain a vector-valued FBSDE for the process . To this end, define the column vector of filtered drift processes where . Next, as is -measurable, stacking the FBSDEs (28) over all values of , we have


where and are all real-valued matrices defined as

where , and is a column vector of the -adapted processes, where as a reminder, , and the -th element is a -martingale.

From the linear structure of the FBSDE (29), we can further simplify the problem by seeking for affine solutions of the form


where is a deterministic and continuously differentiable function of time, and is an -valued stochastic process. Plugging the ansatz into (29), and following through with the same logical steps as before, we find that the ansatz holds true so long as is the solution to the Ricatti-type matrix-ODE


and when solves the BSDE,


where is the same vector of processes present in FBSDE (29).

At this point, we have succeeded in reducing the search for a Nash-equilibrium to solving (i) two deterministic ordinary differential equations (ODEs) (26) and (31), and (ii) a non-standard linear BSDE (32). The ODEs are straightforward to solve, however, BSDE poses some further challenges.

One of the primary obstacles in solving the BSDE (32) is that each component of incorporates a process that is a martingale under a different probability measure. Recall that the components of are required to be martingales with respect to the different measures . Each measure is what agents within sub-population use to compute expectations, and agents within that sub-population assume the asset has drift in excess of the order-flow from all agents. The key step in solving the BSDE is to re-cast it in terms of martingales under a single probability measure. This introduces non-trivial drfit adjustments, however, we find that it is indeed possible to solve the modified BSDE explicitly.

Consider the th dimension of the BSDE (32)


where is a -martingale, and where is defined as the -th row of the deterministic matrix-valued function . The solution of BSDE (33) can be expressed implicitly as follows


Next, we aim to represent (34) in terms of expectation under another measure such that for all . By the assumption that for all , there always exists such a measure. For example, for some . Given this measure, define the -adapted Radon-Nikodym derivative processes


Using this process, we find that we may write equation (34) as an expected value under the measure as,


Defining the diagonal valued process , where , allows us to write a linear BSDE for using a single measure . More specifically, from (36), we have that


where is an -valued -martingale. The BSDE (37) is linear and its solution can be expressed in closed form. The following theorem provides a representation for the solution of as well as , and .

Theorem 2 (Solutions to the Mean-Field BSDEs)

  1. Let be any probability measure such that . Then the BSDE (32) admits a closed form solution,


    where is the solution to the forward matrix-valued SDE