Learning Agents in BlackScholes Financial Markets
Abstract
BlackScholes (BS) is the standard mathematical model for European option pricing in financial markets. Option prices are calculated using an analytical formula whose main inputs are strike (at which price to exercise) and volatility. The BS framework assumes that volatility remains constant across all strikes, however, in practice it varies. How do traders come to learn these parameters?
We introduce natural models of learning agents, in which they update their beliefs about the true implied volatility based on the opinions of other traders. We prove exponentially fast convergence of these opinion dynamics using techniques from control theory and leaderfollower models, thus providing a resolution between theory and market practices. We allow for two different models, one with feedback and one with an unknown leader.
1 Introduction
Derivative contracts are actively traded across the world’s financial markets with a total estimate worth in the trillions of dollars. To get an intuitive understanding of the setting and the issues at hand let’s consider the prototypical example of European options.
A European option is the right to buy or sell an underlying asset at some point in the future at a fixed price, also known as the strike. A call option gives the right to buy an asset and a put option gives the right to sell an asset at the agreed price. On the opposite side of the buyer is the seller who has relinquished his control of exercise. Buyers of puts and calls can exercise the right to buy or sell. Sellers of options have to fulfill obligations when exercised against. The payoff of a buyer of a call option with stock price at expiry time and exercise price is , whereas for a put option is .
To get a price we input the current stock price (e.g. $101), the exercise price (e.g. $90), the expiry (e.g. three months from today) and the volatility in the BlackScholes (BS) formula and out comes the answer, the quoted price of the instrument [5].
Volatility, which captures the beliefs about how turbulent the stock price will be, is left up to the market. This parameter is so important that in practice the market trades European calls and puts by quoting volatilities.^{1}^{1}1Using the BlackScholes formula with particular implied volatility, traders obtain a dollar value price. How does the market decide about what the quoted volatility should be (e.g. for a stock index in 3 months from now) is a critical, but not well understood, question. This is exactly what we aim to study by introducing models of learning agents who update their beliefs about the volatility.
Our contribution. We introduce two different classes of learning models that converge to a consensus. The first introduces a feedback mechanism (Section 3.1, Theorem 3.1) where agents who are off the true “hidden” volatility parameter feel a slight (even infinitesimally so) pull towards it along with the all the other “random” chatter of the market. This model captures the setting where traders have access to an alternative trading venue or an information source provided by brokers and private message boards. The second model incorporates a market leader (e.g. Goldman Sachs) that is confident in its own internal metrics or is privy to client flow (private information) and does not give any weight to outside opinions (Section 3.2,Theorem 3.3). Proving the convergence results (as well as establishing the exponentially fast convergence rates) requires tools from discrete dynamical systems. We showcase as well as complement our theoretical results with experiments (e.g. Figures 2.a2.d), which for example show that if we move away from our models convergence is no longer guaranteed.
Options can be struck at different strike prices on the same asset (e.g. ). If the underlying asset and the time to exercise (e.g. 3 months) are the same, one would expect the volatility to be the same at different strikes. In practice, however, the market after the 1987 crash has evolved to exhibit different volatilities. This rather strange phenomenon is referred to as the smile, or smirk (see figure 1). Depending on the market, these smirks can be more or less pronounced. For instance, equity markets display a strong skew or smirk. A symmetric smile is more common in foreign exchange options markets. An excellent introduction to volatility smiles is given in [8].
We formalize the multidimensional analogues of our two models above using Kronecker products (Section 4, Theorems 4.1 and 4.3). Thus our models show how a volatility curve could function as a global attractor given adaptive agents. We conclude the paper by discussing future work on identifying necessary structural conditions on the shape of arbitrage free volatility curves.
2 Model description
In mathematical opinion dynamic models agents take views of other agents into account before arriving at their own updated estimate. Agents can observe other agents’ previous signals.
Degroot [6] was one of the early developers of such observational learning dynamics. While simple, these models allow us to examine convergence to consensus. In a sense these type of models are called naive models, as agents can recall perfectly what the other players submitted in the previous round.
2.1 Volatility Basics
Investors have an initial opinion of the implied volatility, which subsequently gets updated after taking into account volatilities of other agents. A feedback mechanism aids the agents in arriving at the true volatility parameter.
At all times the focus is on a static picture of the volatility smile. Within this static framework agents are updating their opinion of the true implied volatility. This updating occurs in a highfrequency sense. In an exchange setting, one can think of all bids and offers as visible to agents. The agents initially are unsure of the true value of the implied volatility, but by learning  and feedback  get to the true parameter. Our first attempt is a naive learning model common in social networks. Learning occurs between trading times. Thus our implicit assumption is that no transactions occur while traders are adjusting and learning each others quotes.
This rather peculiar feature is market practice. Trading happens at longer intervals than quote updating. This is as true for high frequency trading of stocks as it is for options markets. Quotes and prices  or rather vols  are changing more frequently than actual transactions.
Each dollar value of an option corresponds to an implied volatility parameter that depends on strike and expiry. Implied volatility is quoted in percentage terms.
Assumption 2.1.
We have three types of players: agents/traders, brokers and leaders. Brokers give feedback to the traders. The ability of agents to determine this feedback is their learning ability. Leaders are unknown and don’t give feedback but their quotes are visible.
Each agent takes a weighted average of the all the agents’ estimates of volatility at a particular strike and expiry.
2.2 Naive Opinion Dynamics
A first approach towards opinion dynamics is to assume each agent takes a weighted average of other agents’ opinions and updates his own estimate of the volatility parameter for the next period, i.e., at time , the opinion of the th agent is given by
(1) 
where is the opinion of agent at time and denotes the opinion weights for the investors with and for all . Define ; then, the opinion dynamics of the agents can be written in matrix form as follows
(2) 
where is a rowstochastic matrix.
Definition 2.2 (consensus).
The agents (2) are said to reach consensus if for any initial condition , as for all .
Definition 2.3 (consensus to a point).
The agents (2) are said to reach consensus to a point if for any initial condition , , where denotes the vector composed of only ones and . The constant is often referred to as the consensus value.
For the opinion dynamics , we introduce the following result by [6] (see also [14] for definitions).
Proposition 2.4.
Consider the opinion dynamics in equation (2). If is aperiodic and irreducible, then for any initial condition consensus to a point is reached. The consensus value depends on both the matrix and the initial condition .
Remark 2.5.
Proposition 2.4 implies that if the row stochastic opinion matrix is aperiodic and irreducible; then all the agents converge to some consensus value . However, since depends on the unknown initial opinion , the consensus value is unknown and, in general, different from the true volatility . We wish to alleviate this and thus introduce two novel models.
3 Consensus (scalar agent dynamics)
In this section, we assume that the agents are able to learn how far off they are from the true volatility by informational channels in the marketplace. There are many avenues, platforms and private online chat rooms that provide quotes for option prices; some of these are stale and some are fresh. The agents’ learning ability determines the quality of the feedback from all these sources. We aggregate all of this information in the form of a feedback controller. If they are fast learners, they adjust their volatility estimates quickly.
3.1 Consensus with Feedback
We model this feedback by introducing an extra driving term into the opinion dynamics (1). In particular, we feedback the difference between the agents’ opinion and the true volatility scaled by a learning coefficient . We assume that is invariant, i.e., for some fixed , for some fixed strike and maturity . Then, the new model is written as follows
(3) 
or in matrix form
(4) 
where . Then, we have the following result.
Theorem 3.1.
Consider the opinion dynamics (4) and assume that , ; then, consensus to is reached, i.e., .
Proof.
It is easy to verify that the solution of the difference equation (4) is given by
(5) 
By Gershgorin circle theorem, the spectral radius for all , . It follows that , where denoted the identity matrix of dimension , and , see [9]. The matrix is row stochastic; then, , where denotes the vector composed of only zeros. Hence, we can write ; and consequently . It follows that
and the assertion follows. ∎
Corollary 3.2.
Consensus to is reached exponentially with convergence rate , i.e., , , where denotes the matrix norm induced by the vector infinity norm.
Proof.
Define the error sequence . Then, from (4), the following is satisfied:
The last equality in the above expression follows from the fact that , because is a stochastic matrix. The solution of the above difference equation is given by , where denotes the initial error. Let , , where . Note that exponential convergence of implies exponential convergence of itself. Using the solution , the following can be written:
where denotes the matrix norm of induced by the vector infinity norm [9]. The inequality implies exponential convergence if . Because and , we can compute as , . The matrix is stochastic, which implies and ; therefore, under the conditions of Theorem 3.1 (i.e., ), and hence exponential convergence of the consensus error can be concluded with convergence rate given by . ∎
3.2 Consensus with an unknown leader
One criticism of model (4) is that feedback, even if it is not perfect, has to be learned. In practice, there might not be a helpful mechanism that provides feedback. An alternative is to have an unknown leader embedded in the set of traders. The agents are unsure who the leader is but by taking averages of other traders, they all arrive at the opinion of the leader. In markov chain theory, such behaviour is called an absorbing state. The leader guides the system to the true value. We assume that the identity of the leader is unknown to all agents.
Without loss of generality, we assume that the first agent (with corresponding opinion ) is the leader; it follows that , , , and . Then, in this configuration, the opinion dynamics is given by
(6) 
with , , for all , and for at least one , .
Theorem 3.3.
Consider the opinion dynamics (6) and assume that the matrix is substochastic and irreducible. It holds that , i.e., consensus to is reached.
Proof.
Define the invertible matrix
Introduce the set of coordinates . Note that , . Hence, if the error vector , then consensus to is reached. Note that
where denotes the zero vector of appropriate dimensions and as defined in (6). By construction, ; hence, the consensus error satisfies the following difference equation
(7) 
and the solution of is then given by .
Because for at least one , and is substochastic and irreducible, the spectral radius , see Lemma 6.28 in [14]; it follows that . Therefore, and the assertion follows. ∎
Corollary 3.4.
Let denote some matrix norm such that (such a norm always exists because under the conditions of Theorem 3.3). Then, consensus to is reached exponentially with the convergence rate given by , i.e. , for and some positive constant .
Proof.
See Lemma 5.6.10 in [9] on how to construct such a . Now consider the consensus error defined in the proof of Theorem 3.3, which evolves according to the difference equation (7). It follows that , where denotes the initial consensus error. Under the assumptions of Theorem 3.3, . By Lemma 5.6.10 in [9], implies that there exists some matrix norm, say , such that . We restate the error with norms and obtain . Because all norms are equivalent in finite dimensional vector spaces (see Chapter 5 in [9]), for some positive constant . As , the norm of the consensus error converges to zero exponentially with rate . ∎
4 Consensus (vectored agent dynamics)
In this section, we suppose that agents have beliefs over a range of strikes. Thus, each agent’s opinion of the volatility curve is a vector with each entry corresponding to a particular strike. Typically, in markets, options are quoted for atthemoney (atm) and for two further strikes left of and right of the atm level. Here, we examine the case of strikes and agents, i.e., each agent now has quotes for different moneyness levels. In this configuration, the true volatility is . See figure 1 (b).
4.1 Consensus with Feedback
Again, we assume that each agent takes a weighted average of other agents’ opinions and updates its volatility estimate vector for the next period, i.e., at time , the opinion of the th agent is given by
(8) 
where denotes the learning coefficient of agent , is the opinion of agent at time , and denotes the opinion weights for the investors with and for all . In this case, the stacked vector of opinions is , . The opinion dynamics of the agents can then be written in matrix form as follows
(9) 
where is a rowstochastic matrix, , and denotes Kronecker product. We have the following result.
Theorem 4.1.
Consider the opinion dynamics in (9) and assume that , ; then, consensus to (with ) is reached, i.e., .
Proof.
Define the error sequence . Note that implies that consensus to is reached. Given the opinion dynamics (9), the evolution of the error satisfies the following difference equation
It is easy to verify that, because is stochastic, . Then, the error dynamics simplifies to
(10) 
and consequently, the solution of (10) is given by . By properties of the Kronecker product and Gershgorin’s circle theorem, the spectral radius for . It follows that , see [9]. Therefore, and the assertion follows. ∎
Corollary 4.2.
Consensus to is reached exponentially with the convergence rate given by , i.e., .
The proof of the above result is very similar to previous corollaries and is omitted.
4.2 Consensus with an unknown leader
Similarly to the scalar case; here, we assume that there is a leader driving all the other agents through the opinion matrix . Again, without loss of generality, we assume that the first agent (with corresponding opinion ) is the leader, , , , and . Then, in this configuration, the opinion dynamics is given by
(11) 
with , , for all , and for at least one , .
Theorem 4.3.
Consider the opinion dynamics (11) and assume that the matrix is substochastic and irreducible; then, consensus to is reached, i.e., .
Corollary 4.4.
Let denote some matrix norm such that , then consensus to is reached exponentially with convergence rate , i.e. , for some positive constant .
5 Numerical Simulations
Consider the opinion dynamics with feedback (4) with ten agents (i.e., ), matrix as defined in the supplementary material, , and initial condition . Figure 2 depicts the obtained simulation results for different values of the learning parameters , . Specifically, Figure 2(a) shows results without learning, i.e, (here there is no consensus to ), Figure 2(b) depicts the results for . As stated in Theorem 3.1, consensus to is reached. Figure 2(c) shows results for with and otherwise, . Note that, in this case, the value of violates the condition of Theorem 3.1 (i.e., ) and, as expected, consensus is not reached. Next, consider the opinion dynamics with leader (6) with and initial condition
For the leader case, the opinion weights matrix is constructed by replacing the first row of by . The corresponding matrix (defined in 6) is substochastic and irreducible, and , . Hence, all the conditions of Theorem 3.3 are satisfied and consensus to is expected. Figure 2(d) shows the corresponding simulation results. Finally, Figure 3 shows the evolution of the vectored opinion dynamics (9) with and (i.e., ten three dimensional agents), matrix as in the case with feedback, (vectored) volatility , learning parameters for as in , and initial condition with as in the first experiment above.
6 Arbitrage Bounds
We have taken the true volatility parameter as exogenous to our models. Our only requirement is that there is no static arbitrage, by which we mean that all the quotes in volatility which translate to option prices are such that one cannot trade in the different strikes to create a profit. Checking whether a volatility surface is indeed arbitrage free is nontrivial, nevertheless some sufficient conditions are well known [4]. As long as the volatility surface satisfies them our analysis implies global stability towards an arbitrage free smile.
We parameterize the volatility function (assuming expiry are fixed) and denote the option price as
Our attention is on varying , to ensure no static arbitrage. We assume that the translates into unique call option dollar prices, which follows from the strictly positive first derivative of the option price with respect to .

Condition 1: (Call Spread) For , we have

Condition 2: (Butterfly Spread) For ,
How these arbitragefree curve volatility conditions are developed is not an easy task: see an account by [13]. Delving into this topic would take us further into stochastic analysis and away from the focus of this paper.
7 Connections and Conclusion
Recently, there has been some rather interesting work on the intersection of computer science and option pricing. In [7] the authors showed how to use efficient online trading algorithms to price the current value of financial instruments, deriving both upper and lower bounds using online trading algorithms. Moreover, [2, 1] developed BlackScholes price as sequential twoplayer zerosum game. Whilst these papers made an excellent start to bridge the gap between two different academic communities  mainly mathematical finance and theoretical computer science  they do not address the reality of volatility smiles and trading. Our contribution can be viewed as making these connections more concrete. The smile itself is a conundrum and there have even been articles questioning whether it can be solved [3]. The traditional way from the ground up is to develop a stochastic process for the volatility and asset price, possibly introducing jumps or more diffusions through uncertainty [10]. Such models have been successfully developed, but the time is ripe to incorporate multiagent models with arbitrage free curves.
Combining learning agents in stochastic differential equation models [15], such as the BlackScholes model, is an exciting proposition. Moreover, opinion dynamics as a subject on its own has been studied quite extensively. Recent references that present an expansive discussion are [12, 11].
In this paper, we introduce models of learning agents in the context of option trading. A key open question in this setting is how the market comes to a consensus about market volatility, which is reflected in derivative pricing through the BlackScholes formula. The framework we have established allows us to explore other areas. Thus far, we took the smile as an exogenous object, proving convergence to equilibrium beliefs. A natural step forward would be to look at the beliefs as probability measures, where each measure corresponds to a different option pricing model. Our learning models focus on interaction between agents. Actually, agents can be interpreted as algorithms. Each algorithm corresponding to a particular belief of a pricing model.
Acknowledgements
The authors would like to thank Elchanan Mossel, Ioannis Panageas, Ionel Popescu and JM Schumacher for fruitful discussions. Tushar Vaidya would like to acknowledge a SUTD Presidential fellowship. Carlos Murguia would like to acknowledge the National Research Foundation (NRF), Prime Minister’s Office, Singapore, under its National Cybersecurity R&D Programme (Award No. NRF2014NCRNCR00140) and administered by the National Cybersecurity R&D Directorate. Georgios Piliouras would like to acknowledge SUTD grant SRG ESD 2015 097 and MOE AcRF Tier 2 Grant 2016T21170.
References
 [1] Jacob Abernethy, Peter L Bartlett, Rafael Frongillo, and Andre Wibisono. How to hedge an option against an adversary: Blackscholes pricing is minimax optimal. In Advances in Neural Information Processing Systems, pages 2346–2354, 2013.
 [2] Jacob Abernethy, Rafael M Frongillo, and Andre Wibisono. Minimax option pricing meets blackscholes in the limit. In Proceedings of the fortyfourth annual ACM symposium on Theory of computing, pages 1029–1040. ACM, 2012.
 [3] Elie Ayache, Philippe Henrotte, Sonia Nassar, and Xuewen Wang. Can anyone solve the smile problem? The Best of Wilmott, page 229, 2004.
 [4] Peter Carr and Dilip B Madan. A note on sufficient conditions for no arbitrage. Finance Research Letters, 2(3):125–130, 2005.
 [5] Neil Chriss. Black Scholes and beyond: option pricing models. McGrawHill, 1996.
 [6] Morris H DeGroot. Reaching a consensus. Journal of the American Statistical Association, 69(345):118–121, 1974.
 [7] Peter DeMarzo, Ilan Kremer, and Yishay Mansour. Online trading algorithms and robust option pricing. In Proceedings of the thirtyeighth annual ACM symposium on Theory of computing, pages 477–486. ACM, 2006.
 [8] Emanuel Derman and Michael B Miller. The Volatility Smile. John Wiley & Sons, 2016.
 [9] Roger A. Horn and Charles R. Johnson. Matrix Analysis. Cambridge University Press, New York, NY, USA, 2nd edition, 2012.
 [10] Michael Kamal and Jim Gatheral. Implied volatility surface. Encyclopedia of Quantitative Finance, 2010.
 [11] Tung Mai, Ioannis Panageas, and Vijay V. Vazirani. Opinion dynamics in networks: Convergence, stability and lack of explosion. 44th International Colloquium on Automata, Languages, and Programming (ICALP), 2017.
 [12] Elchanan Mossel and Omer Tamuz. Opinion exchange dynamics. CoRR, abs/1401.4770, 2014.
 [13] Michael Roper. Arbitrage free implied volatility surfaces. preprint, 2010.
 [14] Ernesto Salinelli and Franco Tomarelli. Discrete dynamical systems: onestep scalar equations, pages 85–124. Springer International Publishing, Cham, 2014.
 [15] Martin Schweizer and Johannes Wissel. Arbitragefree market models for option prices: The multistrike case. Finance and Stochastics, 12(4):469–505, 2008.