Modelling intensities of order flows in a limit order book

# Modelling intensities of order flows in a limit order book

Ioane Muni Toke University of New Caledonia, Noumea, New Caledonia. Nakahiro Yoshida Graduate School of Mathematical Sciences, University of Tokyo, Tokyo, Japan.
###### Abstract

We propose a parametric model for the simulation of limit order books. We assume that limit orders, market orders and cancellations are submitted according to point processes with state-dependent intensities. We propose new functional forms for these intensities, as well as new models for the placement of limit orders and cancellations. For cancellations, we introduce the concept of ”priority index” to describe the selection of orders to be cancelled in the order book. Parameters of the model are estimated using likelihood maximization. We illustrate the performance of the model by providing extensive simulation results, with a comparison to empirical data and a standard Poisson reference.

## 1 Introduction

The limit order book is the central structure aggregating the orders of all traders to buy and sell shares of a given stock on an exchange. It is standard to simplify the complex diversity of financial messages into three types of orders : limit orders are submitted with a (limit) price into the order book, where they wait to be matched by a counterpart for a transaction ; market orders are submitted without any price and are executed immediately ; cancellations of pending limit orders is possible at any time. The order book can thus be viewed as a complex dual queueing system with price and time priority rules (see Abergel et al. (2016) for an introductory book treatment).

A partial theoretical treatment of this complex random system is possible under very simplistic assumptions, essentially assuming that the submission of limit orders, market orders and cancellations are basic Poisson processes (Cont et al., 2010; Muni Toke, 2015). Exact analytical results are however limited. With appropriate scaling techniques, some limit behaviours of this complex system can be studied, see e.g. Abergel & Jedidi (2013) for a price diffusion process, or Cont & De Larrard (2012) for a diffusion approximation of the volumes at the best quotes.

Another branch of study of the limit order books deals with a more statistical point of view. Smith et al. (2003) investigates the order book structure with mean field techniques. Mike & Farmer (2008) proposes an empirical model of the order book that aims at reproducing some of empirical observations usually made on financial markets. Among other contributions, they propose a Student model for the placement of limit orders and a three-variable model for the cancellation of pending limit orders. The core of the submission mechanism in the order book remains however a Poisson process. Recently, Huang et al. (2015) have proposed a model in which the intensities of submission of limit, market orders and cancellations depend on the volume of the first limit. They are able to show that a queueing system with these intensities is able to reproduce some empirical features of the limit order book, such as the distribution of the first level.

In this paper we propose a general model in line with previous contributions such as Mike & Farmer (2008); Huang et al. (2015). We do not extend or specify previous models but build directly from the data. Our goal is to provide state-dependent intensities of submissions of limit and market orders that can be used for the simulation of a ”realistic” limit order book. We adopt the following modelling principle : limit and market orders intensities should depend on both dimensions of the limit order book, namely the price dimension and the volume dimension. The spread is an obvious choice to include the price dimension in the modelling for both types of orders. The volume of the first level is another obvious choice for market orders, while the total volume available appears to be a good candidate for the limit orders. We define exponential forms of intensities that are convenient for two reasons: they keep the non-negativity of intensities of point processes, and they allow for a practical a maximum-likelihood estimation. For the cancellation process, we introduce a new ”priority index” as a main modelling variable, which turns out to be very efficient. All proposed models are fitted on a database of 10 consecutive trading days (January 17th-28th, 2011) for six different liquid stocks traded on the Paris stock exchange.

The rest of the paper is organized as follows. Section 2 briefly describes the data and its preparation. Section 3 provides empirical insights on the intensity of submission of market orders and build a convenient parametric model. Section 4 introduces a similar model for the intensities of limit orders and provide a very flexible Gaussian mixture model for the placement of limit orders, that is able to reproduce the multi-modality of the empirical distribution. Section 5 shows that the ”priority volume”, i.e. the volume standing in front of an pending orders according to time-price priority rules is a good candidate for the modelling of the ”placement” of cancellations. Finally, Section 6 provides extensive results of our model fitted on market data and simulated. The performance of the model is analysed, in particular with respect to a standard Poisson reference.

## 2 Data

We use data extracted from the Thomson-Reuters Tick History (TRTH) database. We randomly select 6 liquid stocks from the CAC 40 index (i.e. stocks among the highest capitalizations exchanged at the Paris Bourse) : Air Liquide (Reuters Identification Code (ric): AIRP.PA), Alstom (ALSO.PA), BNP Paribas (BNPP.PA), Bouygues (BOUY.PA), Carrefour (CARR.PA), Electricite de France (EDF.PA). These stocks represent a wide panel of liquidity for CAC 40 index: BNPP.PA is a heavily traded stock, one of the most traded on the Paris Stock Exchange, while EDF.PA much less actively traded and is a much smaller capitalization (EDF.PA has even since been removed from the CAC 40 index on December 21st, 2015).

For each stock, two files can be extracted from the TRTH database, which are standardly called the trades file and the quotes file. The quotes file is a sequence of snapshots of the limit order book, listing all the modifications due to the processing of orders, each modification being timestamped with a millisecond resolution. This file can be parsed to extract an (preliminary) order flow of limit orders (increase of the available liquidity on a given side at a given price) and cancel orders (decrease of available liquidity on a given side at a given price). The trades file is then parsed and matched to the previous (preliminary) order flow to identify and convert some of the cancel orders into market orders.

For each trading day, we keep the subset of limit orders, market orders and cancellations occurring between 9:05 in the morning and 17:25 in the afternoon, i.e. we keep the whole trading day except the first five minutes of the day, following the opening auction, and the last fives minutes of the day preceding the closing auction. The data in these very active periods seems indeed of a lesser quality and not always reliable. In order to build the model in Sections 3, 4 and 5, we keep only orders occurring on the ask side of the limit order book, and we glue the ten days of order flows as in an artificial continuous sample. In Section 6 however, we will adopt a more practical point of view and use both sides of the book but only one day of trading at a time to fit and test the model, without any glueing of consecutive trading days.

As a result of this process of data preparation, we have for each stock and each period (one trading day or glued trading days) a list of orders (the order flow) and for each order a list of variables describing the limit order book at the time of submission : spread, volume at the best quotes, total liquidity available at the ten best quotes.

Let us add a few words on the units of this data. Prices in the order book must be integer multiples of a tick size which is fixed by the exchange. In our sample, AIRP.PA and BNPP.PA have a EUR ticksize, while the four other stocks have a EUR ticksize. As for the volumes, they are numbers of shares. For ease of computations and presentation, all volumes are normalized by a stock-dependent quantity equal to the median of the trade size (market orders quantities) for this stock. In order to keep this volumes integers, we round the results to the smallest larger integer (ceiling). As a results, means really no share, while is a small non-zero volume. These remarks should explain the -axis scales of the graphs of the following sections.

## 3 Market orders

Let be the point process of submission of market orders in the limit order book and let be its instantaneous intensity. Our goal is to identify a simple parametric model for , which should be based on meaningful variables and be easy to estimate. We therefore identify two covariates to model : the spread and the volume at the best quote (on the side of submission).

Let us first investigate the spread. Using common financial knowledge, one should expect specific variations of the intensity as a function. Firstly, should be decreasing with . Indeed, if a trader needs to buy a share when is equal to one tick, he cannot gain priority in the limit order book, and therefore has to submit a buy market order to be the first to buy the best quote. On the contrary, if the spread is large, it is sufficient to submit a buy limit order just above the best bid quote to be the first in line for the next sell-initiated transaction.

We compute on our samples an estimator of the spread-dependent intensity of limit orders:

 ^λM(S)=NM(S)T(S), (1)

where is the total number of market orders submitted when the spread is equal to and is the total time during which the spread is equal to in the sample. As an illustration, is plotted in Figure 1 for one of the stocks of the sample (full results will be given below). Figure 1: Left panel : Empirical λM as a function of the spread (^λM(S)). Right panel : Empirical λM as a function of the spread (^λM(q1)). Since data may be noisy for very high values of the parameters, the x-axes span 99% of the empirical distribution of S and 90% of the empirical distribution of q1.

As expected, the intensity of submission of market orders is decreasing with the spread. However, this decrease does not to go to zero, and even seem to increase slightly for very large values of the spread. A plausible interpretation is that when the spread increases above usual levels, this may indicate a highly volatile period with many orders submitted. Subsequent uncertainty might translate into a ”rush” for liquidity maintaining above zero.

Following these empirical results, one might propose the following parametric model to express the functional dependence of on :

 λM(S)=exp(β0+β1ln(S)+β11[ln(S)]2) (2)

The exponential form ensure that remains non-negative. The quadratic argument allows the non-monotony of instead of the power-law form obtain with only one term. The preference for the logarithm of the spread instead of the spread itself is detailed in Remark 3.

Let us now turn to the second explaining variable considered here, the volume of the best quote (on the side of submission, ask for a buy market order, bid for a sell market order). We compute on our samples an estimator of the -dependent intensity of limit orders:

 ^λM(q1)=NM(q1)T(q1), (3)

where is the total number of market orders submitted when the volume on the (same side) best quote is equal to and is the total time during which this volume is equal to in the sample. Recall that the unit for is the median of the trades sizes. Results are plotted on Figure 1. One observes that increases as decreases, as expected. Indeed, when is small, the probability that the first limit vanishes increases. This is an incentive for traders to grab the last shares available at the current price, leading to a ”rush for liquidity”. This monotony however is justified for small values of and there is no obvious reason that the intensity should go to zero for large values of . Figure 1 suggests that we can use a functional dependency on similar to the one suggested for the spread :

 λM(q1)=exp(β0+β2ln(1+q1)+β22[ln(1+q1)]2) (4)

We can finally combine the two dependencies into one single model and add an potential interaction term between the two covariates. We thus obtain the following parametric model for the intensity of submission of market orders in a limit order book:

 λM(t;S(t),q1(t))=exp[ β0+β1ln(S(t))+β11[ln(S(t))]2+β2ln(1+q1(t))+β22[ln(1+q1(t))]2 +β12ln(S(t))ln(1+q1(t))]. (5)

This model can be estimated by likelihood maximization. To emphasize the dependency on the the parameters to be fitted, we write when dealing with the estimation. Log-likelihood for the point process as a function of the parameter vector is defined as:

 LMT(β)=∫T0ln(λM(t;β))dNMt−∫T0λM(t;β)dt. (6)

Let be the set of arrival times of market orders in our sample, the set of times of jumps of the spread process , the set of times of jumps of the first limit process . Then the log-likelihood on the sample is numerically computed as follows:

 LT(β) =β0NM(T)+β1∑tMilnS(tMi−)+β11∑tMi[lnS(tMi−)]2 +β2∑tMiln(1+q1(tMi−))+β22∑tMi[ln(1+q1(tMi−))]2+β21∑tMilnS(tMi−)lnq1(tMi−) −∑ti∈{tSi}∪{tq1i}exp[β0+β1lnS(ti−)+β11[lnS(ti−)]2 +β2ln(1+q1(ti−))+β22[ln(1+q1(ti−))]2+β12lnS(ti−)ln(1+q1(ti−))](ti−ti−1). (7)

It is then numerically maximized using the routine mle2 of the bbmle package in the R language. Results for all the stocks of our samples are given in Table 1. For simplicity of presentation these results are shown for the ask side only (buy market orders), but results for the bid side are similar.

Table 1 provides the numerical values of the parameters as well as the standard deviation estimated by the maximization routine. These standard deviations assess the quality of the fitting and verify that all the fitted values are significant to a high-level, except for small ’s for AIRP.PA and CARR.PA, and small (AIRP.PA and BOUY.PA). This last fact concerning is not very surprising as the joint distribution between and is quite difficult to characterize, and an independence hypothesis between these two modelling variables is not unreasonable for some stocks.

We now provide several graphs to illustrate the fitting performance of the model. We first plot for each stock the empirical intensity as a function of the spread () and the ”marginal” spread-dependent intensity computed by our model. This ”marginal” represents the dependence on the spread when is distributed as in the sample, i.e. if it is computed with obvious notations as :

 ~λM(S)=∑qλM(t;S,q)P(q1=q) (8)

Similarly, we then plot for each stock the empirical intensity as a function of the level () and the ”marginal” -dependent intensity computed by our model as :

 ~λM(q1)=∑sλM(t;s,q1)P(S=s) (9)

Results are given on Figure 2 for the dependence on the spread and on Figure 3 for the dependence on the volume at the best quote . Figure 2: Empirical (^λM(S)) and model (~λM(S)) intensities of market orders as functions of the spread S. Figure 3: Empirical (^λM(q1)) and model (~λM(q1)) intensities of market orders as functions of the volume of the first limit q1.

The ”marginal” intensities allow for a synthetic view of the modelling intensity. In order to provide the reader with the full view of the fitting, we finally plot for each stock the spread-dependent empirical intensities given the volume , and symmetrically the -dependent intensities given the spead . Results are plotted on Figures 4 and 5, for each stock and each time for the first most probables occurrences of the variables. Figure 4: q1-conditional intensities as functions of the spread. Lines with large dots represent empirical intensities and solid lines the fitted model intensities. Each q1 level has one color. Figure 5: Spread-conditional intensities as functions of q1. Dots represent empirical intensities and lines the fitted model intensities. Each spread level has one color.

Let us start with Figure 2. It turns out that the marginal fitting for the spread is always good, and even excellent for most stocks. It seems that it fails to catch the full extent of the increase of observed the large values of the spread for some stocks (ALSO.PA, and to a lesser extent EDF.PA). It is however important to recall that high-spread values are very rare events. For two of the stocks under scrutiny here, we plot the empirical spread distribution in Figure 6.

This shows for example that for EDF.PA (Figure 2, bottom right), the last point on the right, which is the worst fit of the model, actually represents a few thousandths of the spread distribution. It is therefore perfectly normal that the MLE estimation favors the main part of the distribution (left part of the graphs). This good fitting with respect to the spread is confirmed on Figure 5 where each spread-conditional intensity is well modelled for each stock.

Continuing the analysis of the graphs, we observe on Figure 3 that the quality of the fitting of the dependency on the volume seems a bit poorer. The model captures very well the decrease of the intensity as the volume increases, but the challenge here is that the empirical intensities are quite different from stock to stock : some decrease regularly, some faster at the beginning and then show a plateau. This is also visible on Figure 4 where larger levels of have less influence leading to the collapsing of the conditional intensities on the same curve. The secondary role of larger values of is thus not surprising. There again, we show in Figure 7 the empirical distribution of for two stocks for the sake of completeness.

The body of the distribution is clearly to the left, leaving less weight for the higher values.

Therefore, the proposed model is overall a good fit, especially if we keep in mind that despite their differences we have managed to propose the same functional form for the dependence on the spread and the dependence on the volume at the best quote . We end this section by three modelling remarks, opening potential future works, and then move on to modelling of limit orders.

###### Remark 1.

The form is here preferred to for flexibility as it allows for a normalized volume equal to zero. This is not the case in this paper since we have rounded above normalized volumes, so that is really , not a small volume. But the difference being marginal, we keep the general (right-shifted) form.

###### Remark 2.

The likelihood analysis here is a conditional likelihood analysis given and , or a regression analysis with these explanatory variables. We discuss the modeling of limit orders and cancellations in the following sections, where a certain parametric model is introduced for each order. Naturally, these models should be unified to describe the whole picture of all orders though we do not pursuit the integration of models in this paper.

###### Remark 3.

In the above construction of a model for the intensity, the exponential of a quadratic form of the logarithm of the variable is selected by the AIC criterion over an exponential of a quadratic form of the natural variable. Hence our choice that may not appear standard at first sight. Furthermore, significance of every parameter suggests that we could introduce more explanatory variables and select a suitable model by a certain information criterion or a sparse estimation method. This is future work.

## 4 Limit orders

We now turn to the modelling of limit orders. Defining a limit order requires one dimension more than defining a market order : its (limit) price has to be chosen upon submission. We have decided to treat the two problems separately. In a first subsection 4.1, we deal with the point process counting all limit orders (at any prices), with an instantaneous intensity . The distribution of prices is assumed to be independently defined and will be discussed in the following subsection 4.2.

### 4.1 Modelling limit orders intensities

Similarly to what we did for market orders, we choose two variables for our modelling. The price dimension is represented by the spread . As for the ”volume” dimension, we investigate the total volume available in the limit order book at the side of submission (more precisely the sum of all the liquidity available up to the tenth limit), denoted here . Since deals with all limit orders, appears obviously more relevant that as a modelling variable.

Following our modelling principles, we propose the following model for limit orders :

 λL(t;S(t),Q10(t))=exp[ β0+β1ln(S(t))+β11[ln(S(t))]2+β2ln(1+Q10)+β22[ln(1+Q10)]2 +β12ln(S(t))ln(1+Q10)]. (10)

Here, we expect the intensity to increase with the spread (by an argument exactly symmetric to the one we have used in Section 3, see above). We also expect it to increase with decreases since by an expected stability mechanism, a global drop in the available volume should be an incentive to provide more liquidity. As mentioned before, these monotonous variations guessed by ”common financial sense” are only expected to be observed for frequent values of the modelling variables, since (rare) extreme values of the parameter are noisy and therefore difficult to characterize.

The model defined at Equation (10) can be fitted by maximization of the likelihood. It is straightforward to modify the formula given at Equation (7) to obtain the log-likelihood of the model, so we skip it for brevity. The numerical results of the maximum likelihood estimation are given in Table 2.

There again, standard deviations are provided to assess the quality of the fitting.

We now provide graphical illustration of the quality of the fitting of the model. One can straightforwardly adapt Equations (8) and (9) to compute the ”marginal” intensities of limit orders with respect to the spread and . These are plotted on Figures 8 and 9 where they are compared to the empirical intensities.

As for the dependence on the spread, we observe that the intensity exhibits several shapes. There is indeed an increase for large spreads, as we expected, but for small spread we observe either a decrease or a plateau. The model proposed is flexible enough to reproduce these shapes (except the unexpected drop for large spreads for AIRP.PA). We could probably get better fits (for the eye) with some least-squares regression techniques, but the maximum-likelihood estimation chosen here emphasizes on the main body of the distribution, i.e. small spreads.

As for the dependence on the total volume available in the book on the side of submission , we observe that the intensity increases when the available liquidity decreases, as we expected. For some stocks (BNPP.PA or CARR.PA), we observe an increase when increases above average, which the model is able to grasp.

Finally, the proposed model is once again a good fit, especially if we keep in mind that we have managed to propose the same functional form for the limit and market orders intensity, including both a price and a volume variable, following our modelling principle.

### 4.2 Modelling the placement of limit orders

Modelling the placement of limit orders can a be difficult challenge. The support of any placement distribution is indeed state-dependent : in our model that distinguishes between three types of orders (limit, market, cancellation), one cannot submit a sell/buy limit order below/above the current best bid/ask. Such an order should be a market order.

With a simulation perspective, one can settle for a general distribution and then drop at the time of simulation any non-acceptable price (see Section 6). Using this technique, Mike & Farmer (2008) argued that the Student distribution centred around the current best quote is a good fit for the placement of limit orders (using data for the stock AstraZeneca on the London Stock Exchange).

In the same spirit, we will use continuous distributions on to model the placement. will be the current best quote. We consider the placement distribution as a function on the continuous variable price, and then integrate this density to obtain the discrete probability distribution of the placement of limit orders on the grid of integers numbers of ticksize. If is the continuous density of placement of limit orders and is the ticksize, then is the probability that the limit order is submitted a price .

We propose here two models. The first one is a generalized version of the Mike & Farmer (2008) proposition in which the limit orders are placed according the a location-scale version of the Student distribution:

 πL(p;μ,σ,ν)=Γ(ν+12)Γ(ν2)√πνσ(1+1ν(x−μσ)2)−ν+12 (11)

This model interesting as it has only three parameters. However empirical data suggests that for some of the stocks we have studied placement of limit orders is often multi-modal. To our knowledge this observation has not been made before. One indeed observes a peak of submission at the best quote, and then another mode inside the book, a few ticks away from the best quote. In order to reproduce this complex distribution we use a mixture of normal distributions:

 πL(p;G,μ,σ,π)=G∑i=1πiϕ(p;μi,σi), (12)

where is the density of the Gaussian distribution with parameters .

The normal mixture model is fitted with the mclust package of the R language. The fitted parameters are given in Table 3.

For all stocks, the fitted mixture model exhibits the same components. One Gaussian is centred on the best quote and very thin (standard deviation of two-third of a ticksize). This distribution accounts for roughly 20-25% of the submitted limit orders, and helps modelling the peak of limit orders submitted at the best quote. Two other Gaussian distribution are further away in the book (roughly 2-3 and 4-5 ticks away from the best quote) help model the second mode observed and the more passive limit orders.

In order to illustrate the quality of the fitting obtained, Figure 10 plots the model distribution compared to the empirical one.

The fitted location-scale Student is given for comparison. This mono-modal distribution is in our sample centred on the maximum inside the book, a few ticks away from the best. As a result, it underestimates on the one hand the number of orders submitted at the best quote, but on the on the other hand it overestimates the number of aggressive orders submitted inside the spread.

###### Remark 4.

We observe that the multi-modality of the placement of limit orders strongly depends on the observed spread. It is usually stronger for small spreads, and disappears for larger spread. This can be interpreted as follows. When the spread is smaller than usual, the market participants anticipate its widening, thus providing liquidity a few ticks inside the book besides the usual liquidity provided at the best quote. Hence the appearance of two peaks in the distribution on the placement of limit orders, and the strong multi-modality. When the spread is large, market participants anticipate its tightening, thus providing more liquidity close to the best quote, hence the disappearance of the multi-modality.

It is easy to generalize our model given at Equation (12) to a spread-dependent model, by splitting our sample according to the observed spread and then fitting spread-dependent parameters:

 πL(p;G,μ,σ,π)=G∑i=1πi(S)ϕ(p;μi(S),σi(S)). (13)

This would increase the number of parameters of the model but allow for a better flexibility in the modelling of the placement of limit orders. With the simulation of Section 6 in mind and given the good performances of the proposed fit, we stick, at least for now, to the unconditional model.

## 5 Cancellations of pending orders

Cancellations are different from the two previous types of orders studied (limit and market) because they are not a message to buy or sell some shares on the market, but a message to cancel a previous message to buy or sell some shares. For example, we cannot model the placement of cancellations as we did for the limit orders, since we can only cancel orders at prices where some orders at actually standing in the book. We thus adopt a completely different type of modelling for cancellations.

The first choice of modelling is that we do not model the intensity of submission of cancellation, but we model instead the lifetime of pending limit orders. One reason for this choice is that cancellations ensures the stability of the system. Cancellation process is intimately linked to the limit submission process. By defining an autonomous state-dependent cancellation process, we introduce a risk of instability in the model. The choice of the lifetime of orders as the main variable is thus a safe choice. Its drawback however is that it is a very difficult parameter to estimate. Our trades and quotes database does not provide a unique identifier for each order, thus when we observe a cancellation we do no know for sure which limit order has been cancelled. We can narrow it down by selecting only limit orders with the volume and price equal to the one cancelled, but this identification does not necessarily return a unique match. Finally, even if we perform the above algorithm with some selection rules, the obtained distribution is not necessarily easy to characterize. As an example, on the stock AIRP.PA on January 17th, 2011, the above algorithm gives an empirical distribution of lifetimes with median of 5.2 seconds, and a mean of 89.7 seconds.

We choose to compute the average lifetime of an order so that a basic order book model with Poisson intensities would have an average total liquidity in the book equal to the empirical observation. More precisely, Muni Toke (2015) shows that in an order book with Poisson arrival of market orders with intensity and average size , Poisson arrival of limit orders with intensity and average size , and a lifetime of pending limit orders exponentially distributed with parameter , the expected total liquidity available in the book is

 Q=σM⎛⎜ ⎜⎝νq−δ+δqν1−q2F1(δ,−ν1−q,1+δ,1−q)⎞⎟ ⎟⎠ (14)

where , , and is the hypergeometric function. It is easy to numerically optimize so that given in the equation above is equal to its empirical counterpart.

The second choice of modelling deals with the ”placement” of the cancellations. Mike & Farmer (2008) has proposed a three-variable model to determine the placement of cancellations, based on the distance to the best quote, the total liquidity available and the imbalance. We here propose a new model efficient one-parameter model to choose which pending order is to be cancelled. We introduce as modelling variable the ”priority index”. We firstly define the ”priority volume” of a limit order as the sum of all the sized of pending limit orders standing ahead in the queue, i.e. at a better price or at the same price but with time priority. If a limit order is the oldest order standing at the best quote, then it will be executed first when a market order arrives, its priority volume is thus zero. One may expect that the probability to be cancelled decreases with the priority volume, but that would be ignoring the fact that most of the activity occurs around the best quotes.

Let us now define the ”priority index” of a pending limit order as the ratio of the ”priority volume” defined above over the total volume available in the book (on the same side). Obviously . can be used as a indicator of placement of cancellations inside the book. As for the empirical estimation of however, our data does not allow for the unique tracking of individual orders. We know the price of an order, but not exactly where the order is inside the sub-queue of all orders at this price (at least not without further algorithmic development). We thus compute the priority volume as the total liquidity available at better prices plus half the liquidity available at the same price, i.e. we act as if the cancelled order were in the middle of the queue. This allows for an easy estimation of on our data. It turns out that the distribution of cancellations as a function of is remarkably smooth. Some empirical results are given below. We propose to model it with a scaled truncated power law distribution, i.e. we have the following model for the density of the cancellation ”placement” :

 πC(ξ)=σ(α+1)(1+σ)α+1−1(1+σξ)α. (15)

The log-likelihood of a sample is straightforwardly computed as

 L(α,σ)=Nlog(σ(α+1)(1+σ)α+1−1)+αN∑i=1log(1+σξi), (16)

which can be numerically maximized using the mle2 routine of the bbmle package. Numerical results of the maximum-likelihood estimation are given in Table 4.

Illustrations of the quality of the fit are provided on Figure 11. Figure 11: Empirical and model distribution of the placement of cancellations as a function of the priority index.

Table and figures all show an excellent agreement between the model and the empirical data for all the stocks studied.

## 6 A market simulator with state-dependent order flows

We show the benefits of our model by fitting it to daily empirical data and simulating it. Simulating a ”realistic” limit order book is a quite complex task given the many parameters involved and the somewhat complex time-priority execution mechanism to be implemented. Several results have previously been obtained, for example in Gatheral & Oomen (2010); Muni Toke (2011). Some key elements for basic simulation can be found in Abergel et al. (2016).

### 6.1 Market simulator

We build a market simulator with four agents. Two ”liquidity providers” submit (and cancel) limit orders, one on the ask side and another on the bid side. Two ”liquidity takers” submit market orders, one on the ask side and another on the bid side. We choose to simulate here a symmetric limit order book, i.e. both providers share the same parameters, and both takers share the same parameters.

Liquidity providers submit limit orders with the intensity defined in Equation (10). The distribution of the sizes of the limit orders is exponentially distributed with parameters where is the median of the empirical sizes of limit orders. The distribution of the prices of the limit orders is defined by our Gaussian mixture model given by Equation (12).

Liquidity takers submit market orders with the intensity defined in Equation (5). The distribution of the sizes of the limit orders is exponentially distributed with parameters where is the median of the empirical sizes of limit orders.

Finally, cancellations in the order book occur with an intensity proportional to the available liquidity, i.e. where is the total number of orders and is determined by the procedure detailed in Section 5 and Equation (14). When a cancellation occurs, a random priority index is drawn according to the distribution with density given at Equation (15). This distribution is easy to simulate given its inverse cumulative distribution function :

 (ΠC)−1(x)=1σ[[((1+σ)α+1−1)x+1]1α+1−1]. (17)

The order cancelled is then the first one that has a priority index greater or equal to .

### 6.2 Poisson simulator reference

To provide a reference simulation, we simulate a standard Poisson model. This reference model has the same agents, the same distributions of sizes of limit and market orders, and the same cancellation intensity proportional to the liquidity available. However, all agents submit their orders according to a homogeneous Poisson process with a constant intensity fitted by MLE estimation. The placement of limit orders is done according to the location-scale Student distribution given in Equation (11). Finally, the cancellation is purely zero-intelligence in the sense that the chosen order when a cancellation occurs is uniformly selected in the book.

### 6.3 Simulation results

We fit our model for each stock of our sample, and using one day of trading. Since we simulate a symmetric limit order book, we aggregate bid and ask order flows in one sample for the fitting. We have made the full simulation of our model for each of the first two days of the sample, but for the sake of brevity, we show in this section the results for only one day, January 18th, 2011. Results for the other day tested are exactly similar. The sample used for fitting is smaller than the full one (ten days) used in the previous sections to derive the functional shapes of the intensities and distributions of our model. This may lead to potentially noisier estimates of our model, but for practical purposes one trading day is a convenient unit of time, hence this choice.

The simulator (and the reference Poisson simulator) is then run to produce exactly one day of trading data (i.e. the same length as the fitting sample). We then analyse the simulated data and compare it to the empirical observations.

One of the most important feature is that our model is able to reproduce very well the empirical distribution of the spread. On Figure 12, the simulated distribution is a good fit of the empirical one, while the Poisson reference is not relevant at all. Figure 12: Distribution of the spread in the model, compared to the empirical distribution and the one produced by a Poisson model. Data: January 18th, 20011.

The spread in the Poisson model is most of the time equal to 1 tick, i.e. the book is ”stuck”. Our model of intensities is able to tackle this problem by increasing the market intensity and decreasing the limit intensity when the spread is small, as it is empirically observed. It is remarkable to observe that this close fit is obtained for all stocks and dates tested, irrespective of the liquidity and ticksize of the stock studied.

We now turn to the second modelling variable of our model. On Figure 13, we plot the empirical distribution of and its simulated counterparts. Figure 13: Distribution of q1 in the model, compared to the empirical distribution and the one produced by a Poisson model. Data: January 18th, 20011.

There again, our model provides an excellent fit for this distribution while the standard Poisson reference constantly underestimates the probability to observe smaller values of , i.e. its distribution is shifted to the right. Results are similar for all stocks and dates tested.

If we finally turn to the last variable used in our model, the total volume available , then our model is able to reproduce the time average of this quantity. Figure 14 plots the empirical average shape of the order book and the ones produced by the simulators. Figure 14: Average shape of the order book in the model, compared to the empirical shape and the one produced by the Poisson model. Data: January 18th, 20011.

Both models are able to quite well reproduce the order of magnitude of average shape of the limit order book. This is not surprising, since the magnitude of the average is directly linked to the way we estimate the parameter in Section 5, which is identical in both models. However only our model correctly reproduces the slope of the average order book for the best prices, as well as a sound estimation of the position of the maximum away from the best quotes. The Poisson reference exhibits a sharper slope for the best prices, realizes a maximum too high and too close to the best quote, and underestimates the volume available far away from the best quotes. Once again, these observations are valid for all stocks and dates tested.

If we go into more details, Figure 15 plots the empirical distribution of . Figure 15: Distribution of Q10 in the model, compared to the empirical shape and the one produced by the Poisson model.

It turns out that the empirical distribution exhibits a quite heavy tail for large values of . Since both models are fitted on the mean, this leads to an underestimation of the probability of lower values of in both simulations. However, the full model outperforms the Poisson reference even in this case.

## 7 Conclusion

We have provided a fully parametric model for the limit order book. The submission of orders is modelled as a point processes with state-dependent intensities. We provide detailed functional forms for these intensities, as well as the estimation procedure by likelihood maximization. By developing a market simulator we are able to show that the model performs very well to reproduce key features of the order book, such as the spread and the volume of the best quote in the order book.

This very empirical and numerical work will hopefully lead to further improvements. The intensities we have proposed here are chosen with respect to some model principles in the choice of variables and functional forms. One may probably go further in the statistical model by experimenting other forms or variables.

This work could also stimulate research on the stability of such complex random systems. Although the mathematics of the ”Poisson” models for the order book are beginning to be well-understood, the introduction of state-dependent intensities could lead to several theoretical problems that have not been studied here.

## References

• (1)
• Abergel et al. (2016) Abergel, F., Anane, M., Chakraborti, A., Jedidi, A. & Muni Toke, I. (2016), Limit order books, Cambridge University Press.
• Abergel & Jedidi (2013) Abergel, F. & Jedidi, A. (2013), ‘A mathematical approach to order book modeling’, International Journal of Theoretical and Applied Finance 16(05), 1350025.
• Cont & De Larrard (2012) Cont, R. & De Larrard, A. (2012), ‘Order book dynamics in liquid markets: limit theorems and diffusion approximations’, arXiv preprint arXiv:1202.6412 .
• Cont et al. (2010) Cont, R., Stoikov, S. & Talreja, R. (2010), ‘A stochastic model for order book dynamics’, Operations research 58(3), 549–563.
• Gatheral & Oomen (2010) Gatheral, J. & Oomen, R. C. (2010), ‘Zero-intelligence realized variance estimation’, Finance and Stochastics 14(2), 249–283.
• Huang et al. (2015) Huang, W., Lehalle, C.-A. & Rosenbaum, M. (2015), ‘Simulating and analyzing order book data: The queue-reactive model’, Journal of the American Statistical Association 110(509), 107–122.
• Mike & Farmer (2008) Mike, S. & Farmer, J. D. (2008), ‘An empirical behavioral model of liquidity and volatility’, Journal of Economic Dynamics and Control 32(1), 200–234.
• Muni Toke (2011) Muni Toke, I. (2011), “Market making” in an order book model and its impact on the spread, in ‘Econophysics of Order-driven Markets’, Springer, pp. 49–64.
• Muni Toke (2015) Muni Toke, I. (2015), ‘The order book as a queueing system: average depth and influence of the size of limit orders’, Quantitative Finance 15(5), 795–808.
• Smith et al. (2003) Smith, E., Farmer, J. D., Gillemot, L. s., Krishnamurthy, S. et al. (2003), ‘Statistical theory of the continuous double auction’, Quantitative finance 3(6), 481–514.
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters   