The price impact of order book events: market orders, limit orders and cancellations

The price impact of order book events: market orders, limit orders and cancellations

Zoltán Eisler Capital Fund Management, Paris, France    Jean-Philippe Bouchaud Capital Fund Management, Paris, France    Julien Kockelkoren Capital Fund Management, Paris, France
September 17, 2019
Abstract

While the long-ranged correlation of market orders and their impact on prices has been relatively well studied in the literature, the corresponding studies of limit orders and cancellations are scarce. We provide here an empirical study of the cross-correlation between all these different events, and their respective impact on future price changes. We define and extract from the data the “bare” impact these events would have, if they were to happen in isolation. For large tick stocks, we show that a model where the bare impact of all events is permanent and non-fluctuating is in good agreement with the data. For small tick stocks, however, bare impacts must contain a history dependent part, reflecting the internal fluctuations of the order book. We show that this effect can be accurately described by an autoregressive model on the past order flow. This framework allows us to decompose the impact of an event into three parts: an instantaneous jump component, the modification of the future rates of the different events, and the modification of the jump sizes of future events. We compare in detail the present formalism with the temporary impact model that was proposed earlier to describe the impact of market orders when other types of events are not observed. Finally, we extend the model to describe the dynamics of the bid-ask spread.

price impact, market orders, limit orders, cancellations, market microstructure, order flow
preprint:

I Introduction

The relation between order flow and price changes has attracted considerable attention in the recent years Hasbrouck (2007); Mike and Farmer (2008); Bouchaud et al. (2004, 2006); Lyons (2006); Bouchaud et al. (2009). To the investors’ dismay, trades on average impact the price in the direction of their transactions, i.e. buys push the price up and sells drive the price down. Although this sounds very intuitive, a little reflection shows that such a statement is far from trivial, for any buy trade in fact meets a sell trade, and vice-versa! On the other hand, there must indeed be a mechanism allowing information to be included into and reflected by prices. This is well illustrated by the Kyle model Kyle (1985), where the trading of an insider progressively reveals his information by impacting the price. Traditionally, the above “one sell for one buy” paradox is resolved by arguing that there are in fact two types of traders coexisting in the ecology of financial markets: (i) “informed” traders who place market orders for immediate execution, at the cost of paying half the bid-ask spread, and (ii) uninformed (or less informed) market makers who provide liquidity by placing limit orders on both sides of the order book, hoping to earn part of the bid-ask spread. In this setting, there is indeed an asymmetry between a buyer, placing a market order at the ask, and the corresponding seller with a limit order at the ask, and one can speak about a well defined impact of buy/sell (market) orders. The impact of market orders has therefore been empirically studied in great detail since the early nineties. As reviewed below, many surprising results have been obtained, such as a very weak dependence of impact on the volume of the market order, the long-range nature of the sign of the trades, and the resulting non-permanent, power-law decay of impact.

The conceptual problem is that the distinction between informed trader and market maker is no longer obvious in the present electronic markets, where each participant can place both limit and market orders, depending on his own strategies, the current state of the order book, etc. Although there is still an asymmetry between a buy market order and a sell limit order that enables one to define the direction of the trade, “informed” traders too may choose to place limit orders, aiming to decrease execution costs. Limit orders must therefore also have an impact: adding a buy limit order induces extra upwards pressure, and cancelling a buy limit order decreases this pressure. Surprisingly, there are very few quantitative studies of the impact of these orders – partly due to the fact that more detailed data, beyond trades and quotes, is often needed to conduct such studies. As this paper was under review, we became aware of ref. Hautsch and Huang (2009), where a similar empirical study of the impact of limit orders is undertaken.

The aim of the present paper is to provide a unified framework for the description of the impact of all order book events, at least at the best limits: market orders, limit orders and cancellations. We study the correlations between all events types and signs. Assuming an additive model of impact, we map out from empirical data (consisting purely of trades and quotes information) the average individual impact of these orders. We find that the impact of limit orders is similar (albeit somewhat smaller) to that of market orders.

We then compare these results to a simple model which assumes that all impacts are permanent in time. This works well for large tick stocks, for which the bid-ask spread is nearly constant, with no gaps in the order book. The discrepancies between this simple model and data from small tick stocks are then scrutinized in detail and attributed to the history dependence of the impact, which we are able to model successfully using a linear regression of the gaps on the past order flow. Our final model is specified in Sec. VII, Eq. (30). This framework allows us to measure more accurately the average impact of all types of orders, and to assess precisely the importance of impact fluctuations due to changes in the gaps behind the best quotes.

We want to insist on the fact that our study is mostly empirical and phenomenological, in the sense that we aim at establishing some stylized facts and building a parsimonious mathematical model to describe them without at this stage referring to any precise economic reasoning about the nature and motivations of the agents who place the orders. For recent papers along this latter direction, see e.g. Hendershott et al. (2008); Biais and Weill (2009). We, however, tend not to believe in the possibility, for now, to come up with a model that economists would like, with agents, equilibrium, etc. It seems to us that before achieving this, a more down to earth (but comprehensive) description of the data is needed, on which intuition can be built. This is what we try to provide in this paper. We are also aware of the cultural gap between communities, and that our work will appear to some researchers as “eyeball econometrics”. We, however, are deeply convinced that a graphical representation of data is needed to foster intuition, before any rigourous calibration is attempted.

The outline of this paper is as follows. We first review (Sec. II) the relevant results on the impact of market orders and set the mathematical framework within which we will analyze our order book event data. We explain in particular why the market order impact function measured in previous studies is in fact “dressed” by the impact of other events (limit orders, cancellations), and by the history dependence of the impact. We also relate our formalism to Hasbrouck’s Vector Autoregression framework. We then turn to the presentation of the data we have analyzed (Sec. III), and of the various correlation functions that one can measure (Sec. IV). From these we determine the individual (or “bare”), lag-dependent impact functions of the different events occuring at the bid price or at the ask price (Sec. V). We introduce a simplified model where these impact functions are constant in time, and show that this gives an good approximate account of our data for large tick stocks, while significant discrepancies appear for small tick stocks (Sec. VI). The systematic differences are explained by the dynamics of order flow deeper in the book, which can be modeled as a history dependent correction to the linear impact model (Sec. VII, see Eq. (30)). Our results are summarized in the conclusion, with open issues that would deserve more detailed investigation. In the Appendices we also show how the bid-ask spread dynamics can be accounted for within the framework introduced in the main text (Appendix A) and some supplementary information concerning the different empirical correlations that can be measured (Appendix B).

Ii Impact of market orders: a short review

ii.1 The transient impact model

Quantitative studies of the price impact of market orders have by now firmly established a number of stylized facts, some of which appear rather surprising at first sight. The salient points are (for a recent review and references, see Bouchaud et al. (2009)):

  • Buy (sell) trades on average impact the price up (down). In other words, there is a strong correlation between price returns over a given time interval and the market order imbalance on the same interval.

  • The impact curve as a function of the volume of the trade is strongly concave. In other words, large volumes impact the price only marginally more than small volumes.

  • The sign of market orders is strongly autocorrelated in time. Despite this, the dynamics of the midpoint is very close to being purely diffusive.

A simple model encapsulating these empirical facts assumes that the mid-point price can be written at (trade) time as a linear superposition of the impact of past trades Bouchaud et al. (2004, 2006):111In the following, we only focus on price changes over small periods of time, so that the following additive model is adequate. For longer time scales, one should worry about multiplicative effects, which in this formalism would naturally arise from the fact that the bid-ask spread, and the gaps in the order book, are a fraction of the price. Therefore, the impact itself, , is expected to be proportional to a moving average of the price. See Bouchaud and Potters (2000) for a discussion of this point.

(1)

where is the volume of the trade at time , the sign of that trade ( for a buy, for a sell), and is an independent noise term that models any price change not induced by trades (e.g. jumps due to news). The exponent is small; the dependence in might in fact be logarithmic (). The most important object in the above equation is the function which describes the temporal evolution of the impact of a single trade, which can be called a ‘propagator’: how does the impact of the trade at time propagate, on average, up to time ? We discuss in section 2.4 below how Eq. (1) is related to Hasbrouck’s VAR model Hasbrouck (1991, 2007)

An important result, derived in Bouchaud et al. (2004), is that must decay with time in a very specific way, such as to off-set the autocorrelation of the trades, and maintain the (statistical) efficiency of prices. Clearly, if did not decay at all, the returns would simply be proportional to the sign of the trades, and therefore would themselves be strongly autocorrelated in time. The resulting price dynamics would then be highly predictable, which is not the case. Conversely, if decayed to zero immediately, the price as given by Eq. (1) would oscillate within a limited range, and the long-term volatility would be zero. The result of Bouchaud et al. (2004) is that if the correlation of signs decays at large as with (as found empirically), then must decay as with for the price to be exactly diffusive at long times. The impact of single trades is therefore predicted to decay as a power-law (at least up to a certain time scale), at variance with simple models that assume that the impact decays exponentially to a non-zero “permanent” value. More generally, one can use the empirically observable impact function , defined as:

(2)

and the time correlation function of the variable to map out, numerically, the complete shape of . This was done in Bouchaud et al. (2006), using the exact relation:

(3)

This analysis is repeated in a more general setting below (see Sec. V and Eq. (16)). The above model, however, is approximate and incomplete in two, interrelated ways.

  • First, Eq. (1) neglects the fluctuations of the impact: one expects that , which is the impact of trade at some time measured until a later time , to depend both on and and not only on . Its formal definition is given by:

    (4)

    Impact can indeed be quite different depending on the state of the order book and the market conditions at . As a consequence, if one blindly uses Eq. (1) to compute the second moment of the price difference, , with a non-fluctuating calibrated to reproduce the impact function , the result clearly underestimates the empirical price variance: see Fig. 1. Adding a diffusive noise would only shift upwards, but this is insufficient to reproduce the empirical data.

  • Second, other events of the order book can also change the mid-price, such as limit orders placed inside the bid-ask spread, or cancellations of all the volume at the bid or the ask. These events do indeed contribute to the price volatility and should be explicitly included in the description. A simplified description of price changes in terms of market orders only attempts to describe other events of the order book in an effective way, through the non-trivial time dependence of .

Figure 1: and its approximation with the temporary impact model with only trades as events, with and for small tick stocks. Results are shown when assuming that all trades have the same, non fluctuating, impact , calibrated to reproduce . This simple model accounts for of the long term volatility. Other events and/or the fluctuations of impact must therefore contribute to the market volatility as well.

ii.2 History dependence of the impact function

Let us make the above statements more transparent on toy-models. First, the assumption of a stationary impact function is clearly an approximation. The past order flow () should affect the way the trade at time impacts the price, or, as argued by Lillo and Farmer, that liquidity may be history dependent Farmer et al. (2006); Gerig (2008); Bouchaud et al. (2009). Suppose for simplicity that the variable is Gaussian (which turns out to be a good approximation) and that its impact is permanent but history dependent. If we assume that the past order flow has a small influence on the impact, we can formally expand in powers of all past ’s to get:

(5)

If buys and sells play a symmetric role, . Using the fact the ’s are Gaussian with zero mean, one finds that the impact function within this toy-model is given by:

(6)

If one compares this expression with Eq. (3) to extract an effective propagator , it is clear that the resulting solution will have some non-trivial time dependence induced by the third term, proportional to .

ii.3 The role of hidden events

Imagine now that two types of events are important for the dynamics of the price. Events of the first type are characterized by a random variable (e.g., in the above example), whereas events of the second type (say limit orders) are charaterized by another random variable . The “full” dynamical equation for the price is given by:

(7)

Imagine, however, that events of the second type are not observed. If for simplicity and ’s are correlated Gaussian random variables, one can always express the ’s as linear superposition of past ’s and find a model in terms of ’s only, plus an uncorrelated ‘noise’ component coming from the unobserved events:

(8)

is the linear filter allowing to predict the ’s in terms of the past ’s. It can be expressed in a standard way in terms of the correlation function of the ’s and the cross-correlation between ’s and ’s. Notice that the previous equation can be recast in the form of Eq. (1) plus noise, with an effective propagator “dressed” by the influence of the unobserved events:

(9)

From this equation, it is clear that a non-trivial dependence of can arise even if the ‘true’ propagators and are time independent – in other words the decay of the impact of a single market order is in fact a consequence of the interplay of market and limit order flow. As a trivial example, suppose both bare propagators are equal and constant in time () and , . This means that the two types of events impact the price but exactly cancel each other. Then, and , as it should: the dressed impact of events of the first type is zero. This is an idealized version of the asymmetric liquidity model of Lillo and Farmer mentioned above Farmer et al. (2006).

The aim of this paper is to investigate a model of impact similar to Eqs. (1) and (7), but where a wider class of order book events are explicitly taken into account. This will allow us to extract the corresponding single event impact functions, and study their time evolution. As a test for the accuracy of the model, the time behavior of other observables, such as the second moment of the price difference should be correctly accounted for. We start by presenting the data and extra notations which will be useful in the sequel. We then discuss the different correlation and response functions that can be measured on the data.

ii.4 Relation with Hasbrouck’s VAR model

At this stage, it is interesting to relate the above ‘propagator’ framework encoded in Eq. (1) and the econometric Vector Autoregressive (VAR) model proposed by Hasbrouck, and that became a standard in the microstructure literature. In its original formulation, the VAR model is a joint linear regression of the present price return and signed volume onto their past realisations, or more precisely:

(10)

where are i.i.d. noises and the are regression coefficients, to be determined. Eq. (1) can be seen as a special case of the VAR model, Eq. (II.4), provided the following identifications/modifications are made: a) ; b) the coefficients are assumed to be zero; c) since Eq. (1) models prices and not returns, one has . is called the information content of a trade in Hasbrouck’s framework; d) finally, although the autocorrelation of the is measured, the dynamical model for the is left unspecified.

Although the two models are very similar at the formal level, the major distinction lies in the interpretation, which in fact illustrates the difference between econometric models and “microscopic” models. Whereas the VAR model postulates a general, noisy linear relation between two sets (or more) of variables and determines the coefficients via least squares, we insist on a microscopic mechanism that leads to an a priori structure of the model and an interpretation of the coefficients. Eq. (1) is a causal model for impact, which postulates that the current price is a result of the impact of all past trades, plus some noise contribution that represents price moves not related to trades (for example, quote revisions after some news announcements). In this context, there is no natural interpretation for the coefficients, which must be zero: past price changes cannot by themselves influence the present price, although these may of course affect the order flow , which in turn impacts ‘physically’ the price. On the other hand, the interpretation in terms of impact allows one to anticipate the limitations of the model and to suggest possible improvements, by including more events or by allowing for some history dependence, as discussed in the above subsections.

The aim of the present paper is to justify fully this modeling strategy by accounting for all events in the order book. In this case, the variation of the price can be tautologically decomposed in terms of these events, and the corresponding regression coefficients have a transparent interpretation. Furthermore, the limitations of a purely linear model appear very clearly as the history dependence of impact may induce explicit non-linearities (see Sec. VII).

Iii Data and notations

In this paper we analyze data on randomly selected liquid stocks traded at NASDAQ during the period 03/03/2008 – 19/05/2008, a total of trading days (see Table 2 for details). The particular choice of market is not very important, many of our results were also verified on other markets (such as CME Futures, US Treasury Bonds and stocks traded at London Stock Exchange222The results for these markets are not reproduced here, for lack of space, but the corresponding data is available on request.), as well as on other time periods and they appear fairly robust.

We only consider the usual trading time between 9:30–16:00, all other periods are discarded. We will always use ticks ( US dollars) as the units of price. We will use the name “event" for any change that modifies the bid or ask price, or the volume quoted at these prices. Events deeper in the order book are unobserved and will not be described: although they do not have an immediate effect on the best quotes, our description will still be incomplete; in line with the previous section, we know that these unobserved events may “dress” the impact of the observed events. Furthermore, we note that the liquidity is fragmented and that the stocks we are dealing with are traded on multiple platforms. The activity on these other platforms will also “dress” the impact of observable events, in the sense of the previous section. This may account for some of the residual discrepancies reported below.

Events will be used as the unit of time. This “event time" is similar, but more detailed than the notion of transaction time used in many recent papers. Since the dependence of impact on the volume of the trades is weak Jones et al. (1994); Bouchaud et al. (2009), we have chosen to classify events not according to their volume but according to whether they change the mid-point or not. This strong dichotomy is another approximation to keep in mind. It leads to six possible types of events333Our data also included a small number () of marketable (or crossing) limit orders. In principle these could have been treated as a market order (and a consequent limit order for the remaining volume if there was any). Due to technical limitations we decided to instead remove these events and the related price changes.:

  • market orders444To identify multiple trades that are initiated by the same market order, we consider as one market order all the trades in a given stock that occur on the same side of the book within a millisecond. Such a time resolution is sufficient for distinguishing trades initiated by different parties even at times of very intense trading activity. that do not change the best price (noted ) or that do (noted ),

  • limit orders at the current bid or ask () or inside the bid-ask spread so that they change the price (),

  • and cancellations at the bid or ask that do not remove all the volume quoted there () or that do ().

The upper index ’ (“prime") will thus denote that the event changed any of the best prices, and the upper index that it did not. Abbreviations without the upper index (, , ) refer to both the price changing and the non-price changing event type. The type of the event occuring at time will be denoted by .

Our sample of stocks can be divided into two groups: large tick and small tick stocks. Large tick stocks are such that the bid-ask spread is almost always equal to one tick, whereas small tick stocks have spreads that are typically a few ticks.The behavior of the two groups is quite different, and this will be emphasized throughout the paper. For example, the events which change the best price have a relatively low probability for large tick stocks (about altogether), but not for small tick stocks (up to ). Table 2 shows a summary of stocks, and some basic statistics. Note that there is a number of stocks with intermediate tick sizes, which to some extent possess the characteristics of both groups. Technically, they can be treated in exactly the same way as small tick stocks, and all our results remain valid. However, for the clarity of presentation, we will not consider them explicitly in this paper.

Every event is given a sign according to its expected long-term effect on the price. For market orders this corresponds to usual order signs, i.e., for buy market orders (at the ask price) and for sell market orders (at the bid price). Cancelled sell limit orders and incoming buy limit orders both have , while others have . The above definitions are summarized in Table 1. Note that the table also defines the gaps , which will be used later.

It will also be useful to define another sign variable corresponding to the “side" of the event at time , which will be denoted by . It indicates whether the event took place at the bid () or the ask (), thus:

(11)

The difference between and is because limit orders correspond to the addition not the removal of volume, and thus they push prices away from the side of the book where they occur.

In the following calculations we will sometimes rely on indicator variables denoted as . This expression is if the event at is of type and zero otherwise. In other words, , where is the Kronecker-delta. We will also use the notation to denote the time average of the quantity between the brackets. For example, the unconditional probability of the event type can be, by definition, calculated as .

The indicator notation, although sometimes heavy, simplifies the formal calculation of some conditional expectations. For example if a quantity depends on the event type and the time , then its conditional expectation at times of -type events is

Also, by definition

(12)
event definition event sign definition gap definition ()
market order, volume outstanding volume at the best for buy/sell market orders
market order, volume outstanding volume at the best for buy/sell market orders half of first gap behind the ask () or bid ()
partial cancellation of the bid/ask queue for buy/sell side cancellation
limit order at the current best bid/ask for buy/sell limit orders
complete cancellation of the best bid/ask for buy/sell side cancellation half of first gap behind the ask () or bid ()
limit order inside the spread for buy/sell limit order half distance of limit order from the earlier best quote on the same side
Table 1: Summary of the possible event types, the corresponding definitions of the event signs and gaps.
ticker mean spread mean price time/event
(ticks) (USD) (sec)

large tick

AMAT 0.042 0.011 0.39 0.54 0.0018 0.013 1.11 17.45 0.16
CMCSA 0.040 0.0065 0.41 0.53 0.0021 0.0087 1.12 20.29 0.15
CSCO 0.051 0.0085 0.40 0.53 0.0010 0.0096 1.08 67.77 0.10
DELL 0.042 0.0087 0.40 0.54 0.0019 0.011 1.10 20.22 0.17
INTC 0.052 0.0073 0.40 0.54 0.00080 0.0081 1.08 19.43 0.12
MSFT 0.054 0.0087 0.40 0.53 0.0012 0.010 1.09 27.52 0.098
ORCL 0.050 0.0090 0.40 0.54 0.0012 0.010 1.09 20.86 0.16

small tick

AAPL 0.043 0.076 0.32 0.33 0.077 0.16 3.35 140.56 0.068
AMZN 0.038 0.077 0.26 0.31 0.12 0.20 3.70 70.68 0.21
APOL 0.042 0.080 0.24 0.33 0.11 0.20 3.78 55.24 0.40
COST 0.054 0.069 0.27 0.36 0.082 0.16 2.62 67.77 0.39
ESRX 0.042 0.074 0.24 0.32 0.12 0.20 4.12 60.00 0.63
GILD 0.052 0.043 0.34 0.46 0.032 0.077 1.64 48.23 0.23
Table 2: Summary statistics for all stocks, showing the probability of the different events, the mean spread in ticks, the mean price in dollars and the average time between events in seconds. The last column shows the total number of events in the sample.

Iv Correlation and response functions

In this section, we study the empirical temporal correlation of the different events defined above, and the response function to these events.

iv.1 The autocorrelation of and

We first investigate the autocorrelation function of the event signs, calculated as . These are found to be short-ranged, see Fig. 2, where the correlation function dies out after 10-100 trades, corresponding to typically 10 seconds in real time. This is in contrast with several other papers Bouchaud et al. (2004); Lillo and Farmer (2004); Bouchaud et al. (2006, 2009), where ’s are calculated for market orders only ( = , ), and those signs are known to be strongly persistent among themselves, with, as recalled in Sec. II, a correlation decaying as a slow power law. However, the direction of incoming limit orders is negatively correlated with cancellations and market orders. Because the time series contains all types of events, the mixture of long-range positive and negative correlations balances such that only short-range persistence remains. Any other result would be incompatible with little predictability in price returns. As illustrated by the toy example of Sec. II, Eq. (9), this mixing process in fact maintains statistical market efficiency, i.e. weak autocorrelation of price changes.

When limit orders and cancellations are included, one can independently analyze the persistence of the side of the events. According to Eq. (11) this means flipping the event signs of limit orders in the time series, while keeping the rest unchanged. This change reverses the compensation mechanism discussed above, and is found to have long-range correlations in time: is shown in Fig. 2 and decays as with . This long range decay is akin to the long range persistence of market order signs discussed throughout the literature: since market orders tend to persistently hit one side of the book, one expects more limit orders and cancellations on the same side as well. Intuitively, if a large player splits his order and buys or sells using market orders for a long period of time, this will attract compensating limit orders on the same side of the book.

Figure 2: and , averaged for large and small tick stocks.

iv.2 The signed event-event correlation functions

We will see in the following, that for describing price impact the most important correlation functions are those defined between two (not necessarily different) signed event types. For some fixed and one can define the normalized correlation between these signed events as:

(13)

Our convention is that the first index corresponds to the first event in chronological order. Because we have event types, altogether there are of these event-event correlation functions. There are no clearly apparent, systematic differences between large and small tick stocks, hence we give results averaged over both groups in Fig. 3 for and . (Other correlation functions are plotted in Appendix B.) Trades among themselves and regardless of group are long range correlated as it is well known and was recalled above, and confirmed again in Fig. 3. For other cases, the sign of the correlations between event types varies and in many cases one observes a similarly slow decay that can be fitted by a power law with an exponent around . Furthermore, there are two distinctly different regimes. For events (which means up to seconds in real time) returns are still autocorrelated (cf. Fig. 2). In this regime is positive for any event type , so small trades are followed by a ballistic move in the same direction by other trades and also by limit orders, while at the same time cancellations also push the price in the same direction. is also positive except for , where it is negative except for very small lags555There is some sign of oscillations for small tick stocks.. This means that if a market order removes a level, it is followed by further trades and cancellations in the same direction, but the level is refilled very quickly by incoming limit orders inside the spread. For longer times some correlation functions change sign. For example in Fig. 3(left) one can see this reversal for limit orders. Market orders “attract” limit orders, as noted in Weber and Rosenow (2005); Bouchaud et al. (2006); Gerig (2008). This “stimulated refill” process ensures a form of dynamic equilibrium: the correlated flow of market orders is offset by an excess inflow of opposing limit orders, such as to maintain the diffusive nature of the price. This is the same process causing the long-range correlations of noted above.

Figure 3: The normalized, signed event correlation functions , (left) , (right) . The curves are labeled by their respective ’s in the legend. The bottom panels show the negative values.

In general, there are no reasons to expect time reversal symmetry, which would impose . However, some pairs of events appear to obey this symmetry at least approximately, for example and or and , see Fig. 4. On the other hand, for the pair , one can see that limit orders that move the price are immediately followed by opposing market orders. The dual compensation, i.e. a stimulated refill of liquidity after a price moving market order , only happens with some delay. and limit orders also lead to some asymmetry, see Fig. 5; here we see that after a transient, non-aggressive market orders induce compensating limit orders more efficiently than the reverse process.

Figure 4: Examples for time reversal symmetry for normalized, signed event correlations for small tick stocks, note that it is plotted. Lines and points of the same color correspond to the same event pairs. The curves are labeled by their respective ’s and ’s in the legend.

Figure 5: Examples for time reversal asymmetry for normalized, signed event correlations for small tick stocks. Lines and points of the same color correspond to the same event pairs. The curves are labeled by their respective ’s and ’s in the legend.

iv.3 The unsigned event-event correlation functions

A similar definition of a correlation function is possible purely between event occurences, without the signs:

(14)

where we have subtracted such as to make the function decay to zero at large times. This quantity expresses the excess probability of -type events in comparison to their stationary probability, given that there was a -type event lags earlier. Examples of this quantity for averages over all stocks are plotted in Fig. 6. One finds that generally decays slower when both and move the price. This implies that events which change the best price are clustered in time: aggressive orders induce and reinforce each other.

Figure 6: The normalized, unsigned event correlation functions , (left) , (right) . The curves are labeled by their respective ’s in the legend. The bottom panels show the negative values.

iv.4 The response function

Let us now turn to the response of the price to different types of orders. The average behavior of price after events of a particular type defines the corresponding response function (or average impact function):

(15)

This is a correlation function between “sign times indicator" at time and the price change from to , normalized by the stationary probability of the event , denoted as . This normalized response function gives the expected directional price change after an event . Its behavior for all ’s is shown in Fig. 7. We note that all type of events lead, on average, to a price change in the expected direction. Tautologically, for price changing events and for other events. As the time lag increases, the impact of market orders grows significantly, specially for small tick stocks, whereas it remains roughly constant for limit orders/cancellations that do change the price. However, as emphasized in Bouchaud et al. (2004), the response function is hard to interpret intuitively, and in particular is not equal to the bare impact of an event since the correlations between events contribute to , see Eq. (3) above. We now attempt to deconvolute the effect of correlations and extract these bare impact functions from the data.

Figure 7: The normalized response function for (left) large tick stocks and (right) small tick stocks. The curves are labeled according to in the legend.

V The temporary impact model

Market orders move prices, but so do cancellations and limit orders. As reviewed in Sec. II above, one can try to describe the impact of all these events in an effective way in terms of a “dressed” propagator of market orders only, , as defined by Eq. (1). Let us extend this formalism to include any number of events in the following way. We assume, that after a lag of events, an event of type has a remaining impact . The price is then expressed as the sum of the impacts of all past events, plus some initial reference price:

(16)

where the term with the indicators selects exactly one propagator for each , the one corresponding to the particular event type at that time. After straightforward calculations, the response function (15) can be expressed through Eq. (16) and (12) as

(17)

This is a direct extension of Eq. (3), which was obtained in Ref. Bouchaud et al. (2006). One can invert the system of equations in (17), to evaluate the unobservable ’s in terms of the observable ’s and ’s. In order to do this, one rewrites the above in a matrix form, as

(18)

where

(19)

and was replaced by a large enough cutoff , convenient for numerical purposes. In the following, we use , which allows to determine the functions with a good precision up to , see Fig. 8.

As discussed in Sec. II, the origin of the decay of market order price impact is that incoming limit orders maintain an equilibrium with market order flow. In order to keep prices diffusive, limit orders introduce a reverting force into prices, and this precisely off-sets the persistence in market order flow. However, our present extended formalism explicitly includes these limit orders (and also cancellations) as events. If all order book events were described, one naively expects that the ’s should be lag-independent constants for events that change the price, and zero otherwise. Solving the above equation for ’s, however, leads to functions that still depend on the lag , particularly for small tick stocks: see Fig. 8. We see in particular that market orders that do not change the price immediately do impact the price on longer time scales. We also notice that the impact of single events first grows with lag and then decays slowly. The impact of limit orders, although clearly measurable, seems to be significantly smaller than that of market orders, in particular for small tick stocks (see Hautsch and Huang (2009) for a related discussion).

Figure 8: The bare propagators in the temporary impact model for (left) large tick stocks and (right) small tick stocks.

In the rest of the paper, we will try to understand in more detail where the lag dependence of ’s comes from. The discussion of Sec. II already suggested that some history dependence of impact is responsible for this effect. Before dwelling into this, it is interesting to see how well the above augmented model predicts the volatility of the stocks once all the ’s have been calibrated on the empirical ’s. As just mentioned, Eq. (16) neglects the fluctuations of the impact, and we therefore expect some discrepancies. In order to make such a comparison, we first express exactly the variance of the price at lag , in terms of the ’s and the ’s, generalizing the corresponding result obtained in Bouchaud et al. (2004):

(20)

The function , which should be constant for a strictly diffusive process, is plotted in Fig. 9, the symbols indicate the empirical data, and the dashed lines correspond to Eq. (20). Note that we fit both models to each stock separately, compute in each case, and then average the results. We see that the overall agreement is fair for small tick stocks, but very bad for large tick stocks. The reason will turn out to be that for large ticks, a permanent, non fluctuating impact model accounts very well for the dynamics. This reflects that the spread and the gaps behind the best quotes are nearly constant in that case. But any small variation of is amplified through the second term of Eq. (20) which is an infinite sum of positive terms. Hence it is much better to work backwards and test a model where the single event propagator is assumed to be strictly constant over time, as we will explain in the next section.

Figure 9: and its approximations for the two groups of stocks. For small tick stocks the values were divided by for clarity. Symbols correspond to the empirical result. Dashed lines correspond to the temporary impact model with all events and they are calculated from Eq. (20). The agreement is acceptable for small tick stocks, but very poor for large tick ones. Solid lines correspond to the constant impact model, see Eq. (23) below; in this case the agreement with large tick stocks in nearly perfect, at least visually.

Vi A constant impact model

In the above section we found that the single event propagators appear to have a non-trivial time dependence. Another way to test this result is to invert the logic and assume first that the are time independent and see how well, or how badly, this assumption fares at accounting for the shape of the response functions and of the price diffusion .

Let us start from the following exact formula for the midpoint price:

(21)

Here denotes the price change at time if an event of type happens. This can also depend on the sign . For example, if and this means that at a sell market order executed the total volume at the bid. The midquote price change is , which usually means that the second best level was , where is the bid price before the event. The factor is necessary, because the ask did not change, and the impact is defined by the change of the midquote. Hence ’s (and similarly ’s) correspond to half of the gap between the first and the second best quote just before the level was removed (see also Ref. Farmer et al. (2004)). Another example when and . This means that at a sell limit order was placed inside the spread. The midquote price change is , which means that the limit order was placed at , where is the ask price. Thus ’s correspond to half of the gap between the first and the second best quote right after the limit order was placed. In the following we will call the ’s gaps. Note that the events , and do not change the price, so their respective gaps are always zero: there are only three types of ’s that are non-zero.

The permanent impact model is defined by replacing the time dependent ’s by their average values. More precisely, let us introduce the average realized gap:

(22)

The conditional expectation means that the gaps are sampled only when the price change corresponding to that particular kind of gap is truly realized. Therefore, in general , see Table 3 where one sees that the realized gap when a market order moves the price is in fact larger than the unconditional average. The logic is that the opening of a large gap behind the ask is a motivation for buying rapidly (or cancelling rapidly for sellers) before the price moves up.

Our approximate constant impact model then reads:

(23)

The response functions are then, by using Eq. (12), easily given by:

(24)

The formula (24) is quite simple to interpret. We fixed that the event that happened at was of type . Let us now express as:

(25)

This represents the following: Given that the event at was of type and the event at is of type , how much more is it probable, that the direction of the second event is the same as that of the first event? The total price response to some event can be understood as its own impact (lag zero), plus the sum of the biases in the course of future events, conditional to this initial event. These biases are multiplied by the average price change that these induced future events cause. Of course, correlation does not mean causality, and we cannot a priori distinguish between events that are induced by the initial event, and those that merely follow the initial event (see Farmer and Zamani (2007) for a related discussion). However, it seems reasonable to assume that there is a true causality chain between different types of events occuring on the same side of the book (i.e. a limit order refilling the best quote after a market order).

ticker

large tick

AMAT 1.02 1.04 1.02 1.00
CMCSA 1.03 1.14 1.06 1.00
CSCO 1.01 1.02 1.01 1.00
DELL 1.01 1.05 1.02 1.00
INTC 1.00 1.01 1.01 1.00
MSFT 1.01 1.02 1.01 1.00
ORCL 1.01 1.02 1.02 1.00

small tick

AAPL 1.31 1.27 1.27 1.14
AMZN 1.51 1.22 1.30 1.17
APOL 1.76 1.50 1.52 1.42
COST 1.35 1.23 1.24 1.15
ESRX 1.85 1.54 1.60 1.45
GILD 1.11 1.13 1.11 1.03
Table 3: Mean realized gaps and unconditional gaps in ticks for all stocks. All values were multiplied by , so that they correspond to the instantaneous change of the bid/ask and not of the midquote. Note that , while is not observable.

Let us now take Eq. (24), and check how well the true response functions are described by the above constant impact model. Figs. 10 and 11 show that the agreement is very satisfactory for large tick stocks, except when , but these events are very rare (less than ). This agreement is expected because the order book is usually so dense that gaps hardly fluctuate at all; the small remaining discrepancies will in fact be cured below. The quality of the agreement suggests that the time dependence of the bare impact function obtained in Sec. V above is partly a numerical artefact coming from the “brute force” inversion of Eq. (18).

For small ticks on the other hand, noticeable deviations are observed as expected, and call for an extension of the model. This will be the focus of the next sections. One can extend the above model in yet another direction, by studying the dynamics of the spread rather than the dynamics of the mid-point, see Appendix A.

Figure 10: Comparison of true and approximated normalized response functions , using the constant gap model, for (left) large tick stocks and (right) small tick stocks, for events that do not change the price. Symbols correspond to the true value, and lines to the approximation. The data are labeled according to in the legend.

Figure 11: Comparison of true and approximated normalized response functions , using the constant gap model, for (left) large tick stocks and (right) small tick stocks, for events that change the price. Symbols correspond to the true value, and lines to the approximation. The data are labeled according to in the legend.

One can approximate the volatility within the same model as

(26)

As shown in Fig. 9, the constant gap model is very precise for large tick stocks (as again expected), but clear discrepancies are visible for small tick ones.

Vii The gap dynamics of small tick stocks

vii.1 A linear model for gap fluctuations

Let us now try to better understand how gap fluctuations contribute to the response function, and why replacing the gap by its average realized value is not a good approximation for small tick stocks. By definition, without the constant gap approximation, the response function contains contributions which have the form

After using some basic properties of the event signs this quantity can be written as a sum over three contributions:

  1. Firstly, there is the term from the constant gap approximation:

    This contains the highest order of the effect of event-event correlations.

  2. There is a second term that we write as:

    which is the conditional expectation value of the quantity . If is positive, then after an upward price move consecutive upward moves are larger than downward ones, while if is negative then they are smaller. This process can thus either accelerate or dampen the growth of the response function.

  3. The third contribution is of the form

    Here is positive, when the average of the two gaps (up and down) is greater than the time averaged realized value. is positive, when the two events move the price in the same direction. Thus the full term gives a positive contribution to the response function, if two “parallel" events are correlated with larger gaps and hence decreased liquidity at the time of the second event, while opposing events correspond to increased liquidity at the time of the second event. The final effect of this term agrees with the previous one: If is positive, then after an upward price move the consecutive upward moves become larger than downward ones and vice versa.

At this point we need a dynamical model for the ’s, to quantify the above correlations, but we are faced with the difficulty that is only observed for and . What we will do instead is to write a simple regression model directly for the observable quantity , that can be evaluated from data. Then based on this knowledge we will revisit the influence of gap fluctuations on the price dynamics in Sec. VII.3.

vii.2 A linear model for gap fluctuations

The correlation between events has a dynamical origin: market orders and cancellations attract replacement limit orders and vice versa. Eq. (21) is the exact time evolution of price written as a sum of the random variables . We will postulate that both the realized gap and the order flow are influenced by the past order flow , in a linear fashion, i.e.:

(27)

where all ’s are independent noise variables. Similarly, we write for the three price changing events and :

(28)

with other noise variables , and we introduced for later convenience. Note the above equations are again of the vector autoregression type, where the kernel and have a matrix structure.

Both models (27) and (28) can be calibrated to the data by using the same trick an in Sec. V, forming expectation values on both sides and solving a set of linear equations between correlation functions, for example for :

(29)

except this time we have three separate solutions for , and . An example of the solution kernels is given in Fig. 12; the sign of these kernels is expected from what we learnt in Sec. IV. We see for example that a event tends to make a future more probable, and with an increased gap, which makes sense. The same can be repeated with respect to Eq. (28) to calculate the ’s.

Figure 12: Estimates of for small ticks.

An important aspect of these VAR models is that once we have an estimate for their kernels, they can be used for forecasting the future price changes caused by each component of the event flow based on the events that occured in the recent past Hasbrouck (2007). Eq. (27) prescribes for us an estimate for conditional expectation values such as , which is the expected price change due to a market order in the next event (times the probability of such an outcome), and the conditioning is on past signs and indicators. We can proceed similarly for and , and finally the sum of the three components gives the expected price change in the next event.

Such forecasts based on Eqs. (27), (28) perform surprisingly well in practice, although liquidity is fragmented and some events are unobserved. Fig. 13 shows that the expectation value of the left hand side of Eq. (27) is a monotonic function of our prediction, and the relationship on average can be fitted with a straight line with slope , although small higher order (cubic) corrections seem to be present as well. Similar results can be found for Eq. (28) and ’s.

As discussed in Sec. II, one should interpret the kernels as “dressed” objects that include the contribution of events that occur on unobserved platforms. This is justified as long as one is concerned with linear observables, such as average response functions. For non-linear quantities, such as diffusion, some discrepancies are expected.

Figure 13: Performance of Eq. (27) for small ticks. Both axes normalized by standard deviation of predictor.

vii.3 The final model for small ticks

The above analysis suggests a way to build and calibrate an impact model that describes in a consistent way (a) all types of events and (b) the history dependence of the gaps, as we argued to be necessary in Sec. II. The discussion of the previous section motivates the following model:

(30)

where is a kernel that models the fluctuations of the gaps and their history dependence, which will be chosen such that the bare propagator of the model is given by Eq. (43) above.

The model specification, Eq. (30), is the central result of this paper. It can be seen as a permanent impact model, but with some history dependence, modeled as a linear regression on past events. By symmetry, this dependence should only include terms containing since the influence of any past string of events on the ask must be the same as that of the mirror image of the string on the bid. More generally, one may expect higher order, non-linear correction terms of the form

(31)

or with a larger (even) number of ’s. We will not explore such corrections further here, although Fig. 13 suggests these terms are present.

Upon direct identification of Eq. (30) with Eq. (21), and using (27) and (28) one finds that can be expressed in terms of and as:

(32)

We can now compute the average response functions and the diffusion curve within this model, and compare the results with empirical data.

Figure 14: Comparison of true and approximated normalized response functions of the final model for (left) large tick stocks and (right) small tick stocks, for events that do not change the price. Symbols correspond to the true value, and lines to the approximation. To illustrate the goodness of fit on a stock by stock basis, we calculated the absolute difference between the true and the approximated value, the average of this quantity across stocks is indicated by the error bars. The inaccuracy for large is due to a finite size effect in matrix inversion. The data are labeled according to in the legend.

Figure 15: Comparison of true and approximated normalized response functions of the final model for (left) large tick stocks and (right) small tick stocks, for events that change the price. The inaccuracy for large is due to a finite size effect in matrix inversion. Symbols correspond to the true value, and lines to the approximation. To illustrate the goodness of fit on a stock by stock basis, we calculated the absolute difference between the true and the approximated value, the average of this quantity across stocks is indicated by the error bars. The data are labeled according to in the legend.

For the response functions, the addition of the fluctuating gap term in Eq. (30) corrects the small discrepancies found within the constant impact model for large tick stocks. It also allows one to capture very satisfactorily the response function for small tick stocks, see Figs. 14 and 15.666Note that in making these plots we neglected the first and last minutes of trading days, so they slightly differ from those in Sec. VI. The results of the constant gap model are essentially unchanged regardless of such an exclusion.

A much more stringent test of the model is to check the behaviour of the diffusion curve . The exact calculation in fact involves three and four-point correlation functions, for which we have no model. A closure scheme where these higher correlation functions are assumed to factorize yields the following approximation:

(33)

where