Market Making via Reinforcement Learning

Market Making via Reinforcement Learning

Thomas Spooner    John Fearnley    Rahul Savani    Andreas Koukorinis

Market making is a fundamental trading problem in which an agent provides liquidity by continually offering to buy and sell a security. The problem is challenging due to inventory risk, the risk of accumulating an unfavourable position and ultimately losing money. In this paper, we develop a high-fidelity simulation of limit order book markets, and use it to design a market making agent using temporal-difference reinforcement learning. We use a linear combination of tile codings as a value function approximator, and design a custom reward function that controls inventory risk. We demonstrate the effectiveness of our approach by showing that our agent outperforms both simple benchmark strategies and a recent online learning approach from the literature.

Key Words.:
Market Making; Limit Order Books; TD Learning; Tile Coding

ifaamas \acmDOI \acmISBN \acmConference[AAMAS’18]Proc. of the 17th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2018)July 10–15, 2018Stockholm, SwedenM. Dastani, G. Sukthankar, E. André, S. Koenig (eds.) \acmYear2018 \copyrightyear2018 \acmPrice \pdfstringdefDisableCommands


Department of Computer Science \institutionUniversity of Liverpool \affiliation\departmentDepartment of Computer Science \institutionUniversity of Liverpool \affiliation\departmentDepartment of Computer Science \institutionUniversity of Liverpool \affiliationStratagem Technologies Ltd, \affiliationand University College London

1 Introduction

The role of a market maker is to provide liquidity by facilitating transactions with other market participants. Like many trading problems, it has become increasingly automated since the advent of the electronic limit order book (LOB), as the need to handle more data and act on ever shorter time scales renders the task almost impossible for humans Hasbrouck2013; Leaver2016. Upwards of 60% of trading volume on some particularly active markets has been attributed to automated trading systems Savani2012; Haynes2015. This paper uses reinforcement learning (RL) to design competitive market making agents for financial markets using high-frequency historical equities data.

1.1 Related work.

Market making has been studied across a number of disciplines, including economics, finance, artificial intelligence (AI), and machine learning. A classic approach in the finance literature is to treat market making as a problem of stochastic optimal control. Here, a model for order arrivals and executions is developed and then control algorithms for the resulting dynamics are designed Ho1981; Grossman1988; Avellaneda2008; Guilbaud2011; Chakraborty2011; CarteaBLSH. Recent results in this line of research have studied price impact, adverse selection and predictability abergel2016limit, and augmented the problem characteristics with risk measures and inventory constraints Gueant2013; Cartea2015.

Another prominent approach to studying market making and limit order book markets has been that of zero-intelligence (ZI) agents. The study of ZI agents has spanned economics, finance and AI. These agents do not “observe, remember, or learn”, but can, for example, adhere to inventory constraints Gode1993. Newer, more intelligent variants, now even incorporate learning mechanisms Cliff2006; Vytelingum2008. Here, agents are typically evaluated in simulated markets without using real market data.

A significant body of literature, in particular in AI, has studied the market making problem for prediction markets Othman2012; Brahma2012; Othman2013. In this setting, the agent’s main goal is to elicit information from informed participants in the market. While later studies have addressed profitability, the problem setup remains quite distinct from the financial one considered here.

Reinforcement learning has been applied for other financial trading problems Moody1999; Sherstov2004; Schvartzman2009, including optimal execution Nevmyvaka2006 and foreign exchange trading Dempster2006. The first case of applying RL to market making Chan2001 focused on the impact of noise (due to uninformed traders) on the agent’s quoting behaviour and showed that RL successfully converges on the expected strategies for a number of controlled environments. They did not, however, capture the challenges associated with explicitly handling order placement and cancellation, nor the complexities of using continuous state variables. Moreover, Chan2001 found that temporal-difference RL struggled in their setting, a finding echoed in Sherstov2004Chan2001 attributed this to partial observability and excessive noise in the problem domain, despite the relative simplicity of their market simulation. In follow up work, Shelton2001 used importance sampling as a solution to the problems observed with off-policy learning. In contrast, we find temporal-difference RL to be effective for the market making problem, provided that we use eligibility traces and carefully design our function approximator and reward function.

One of the most recent related works is Abernethy2013, which uses an online learning approach to develop a market making agent. They prove nice theoretical results for a stylized model, and empirically evaluate their agents under strong assumptions on executions. For example, they assume that the market has sufficient liquidity to execute market orders entirely at the posted price with no slippage. We use this approach as one of the benchmarks for our empirical evaluation and address the impact of trading in a more realistic environment.

1.2 Our contributions.

The main contribution of this paper is to design and analyse temporal-difference (TD) reinforcement learning agents for market making. In contrast to past work Chan2001; Shelton2001 we develop a high-fidelity simulation using high-frequency historical data. We identify eligibility traces as a solution to the unresolved issues previously associated with reward attribution, noise and partial observability Chan2001. We then design a reward function and state representation and demonstrate that these are key factors in the success of our final agent. We outline the steps taken to develop our agent below:

  1. We build a realistic, data-driven simulation of a limit order book using a basket of 10 equities across 5 venues and a mixture of sectors. This data includes 5 levels of order book depth and transaction updates with time increments on the order of milliseconds.

  2. We address concerns raised in past work about the efficacy of one-step temporal-difference learning, corroborating their results but demonstrating that eligibility traces are a simple and effective solution.

  3. We investigate the performance of a wide range of new and old TD-based learning algorithms.

  4. We show that the “natural” choice of reward function (incremental profit and loss) does not lead to the best performance and regularly induces instability during learning. We propose a solution in the form of an asymmetrically dampened reward function which improves learning stability, and produces higher and more consistent returns.

  5. We provide an evaluation of three different state space constructions and propose a linear combination of tile codings as our final representation as it gives competitive performance with more stable learning.

  6. We present a consolidated agent, based on a combination of the best results from 1–5 above, and show that it produces the best risk-adjusted out-of-sample performance compared to a set of simple benchmarks, a basic RL agent, and a recent online learning approach Abernethy2013. Moreover, we show that the performance of our consolidated agent is competitive enough to represent a potentially viable approach for use in practice.

2 Preliminaries

2.1 Limit order books.

A limit order (LO) is an offer to buy or sell a given amount of an asset at a fixed price (or better). Each limit order specifies a direction (buy/sell or, equivalently, bid/ask), a price and a volume (how much to be traded). A limit order book is an aggregation of LOs that have been submitted to the market. The book has a fixed number of price levels, and the gap between the price levels is called the tick size. An example limit order book is shown in Fig. LABEL:fig:lob.

Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description