Regularities and Irregularities in Order Flow Data

Regularities and Irregularities in Order Flow Data


We identify and analyze statistical regularities and irregularities in the recent order flow of different NASDAQ stocks, focusing on the positions where orders are placed in the orderbook. This includes limit orders being placed outside of the spread, inside the spread and (effective) market orders. We find that limit order placement inside the spread is strongly determined by the dynamics of the spread size. Most orders, however, arrive outside of the spread. While for some stocks order placement on or next to the quotes is dominating, deeper price levels are more important for other stocks. As market orders are usually adjusted to the quote volume, the impact of market orders depends on the orderbook structure, which we find to be quite diverse among the analyzed stocks as a result of the way limit order placement takes place.

I Introduction

The progress in information technology always has a huge impact on stock markets and related systems. It affects not only the trading process itself, but also the availability of data, the tools used for their analysis and the model building that follows. The accessibility of detailed order flow data capturing order book dynamics is useful for researchers and practitioners. Apart from an understanding of financial markets, long term research questions concern systemic issues, such as the usefulness of (de)regulations of financial markets Farmer and Foley (2009), and the consequences of high frequency trading for market stability Lee et al. (2013); Kirilenko et al. (2015); Brogaard (2010). Practitioners investigate the profitability of trading strategies Brogaard (2010). From both viewpoints, systemic and practical, an improvement of agent based models Patzelt and Pawelzik (2013); Meudt et al. (2016); Krause and Bornholdt (2013) exploiting order flow analysis is desirable.

While stylized facts of price time series are entrenched for decades Cont (2001), comparable knowledge on the level of order book dynamics has not yet been established, for a comprehensive review see Gould et al. (2013). As the variability of results is at least partly due to the non-stationarity of markets, empirical findings have to be steadily reviewed with respect to their validity for contemporary data Chakrabarti, Anindya S. and Lahkar, Ratul (2016). For example, in NASDAQ stocks in 2002 large portions of all limit orders are placed far from the quotes Potters and Bouchaud (2003). In contrast, later studies find a dominating role of the quotes and suggest queue models for characterizing the execution sequence of limit orders Gareche et al. (2013). If there are substantial differences in the order flow of different stocks at the same time in the same market, this has implications for the description of the market on the systemic level. When analyzing stock interactions Wang et al. (2016); Hasbrouck and Seppi (2001); Boulatov et al. (2013); Pasquariello and Vega (2013); Chordia et al. (2000), one has to be aware of heterogeneous order flow mechanisms for different stocks. This aspect was not sufficiently investigated so far, maybe partly because the movement of the market has been found to be dominated by collective effects Stepanov et al. (2015).

Here we analyze order flow data of NASDAQ stocks in early 2016. We focus on where orders are placed in the order book, and compare statistical regularities across stocks. We find a rich variety of spread widths, order placement measures and price returns caused by single market orders. Further we search for a clustering of stocks, grouping together stocks with similar behavior.

This study is organized as follows: In Sec. II we describe the data. In Sec. III we reconstruct market orders from the data, and investigate the prices at which limit orders enter the order book. These prices help us to cluster the stocks into groups with similar behavior. The significance of this grouping with respect to liquidity measures as the spread size, number of orders on the quotes, and share of market orders among all orders is investigated. We find striking differences for the returns caused by single market orders for different groups of stocks. Finally, in Sec. IV we summarize our results and give an outlook.

Ii Data

We analyze the data Historical TotalView-ITCH from NASDAQ US. We group the data into the order flows of each stock on one specific day. For example 20160307_AAPL comprises of the orders for Apple shares (the ticker is AAPL) on the NASDAQ US on March, 7 2016 (a Monday). Our data comprise five days from March, 7 2016 to March, 11 2016. Out of the 100 stocks listed in the NASDAQ 100 in this period 100 (), four stocks are not available in (), and therefore cannot be included in our analysis. The data contain information about limit orders being placed, deleted, partially canceled, partially traded and fully traded. Moreover, they contain information about trades against hidden orders. A detailed description of the data can be found in Huang and Polak (2011).

We analyze data from times between 10:00 am and 3:30 pm (New York time). These are the regular trading times of the NASDAQ excluding the first and last minutes. We neglect these, because order flow dynamics at the opening and closing of the market have different statistical properties. All events have a time stamp in milliseconds. Events happening in the same millisecond have the same time stamp, although they may not have happened simultaneously. Chronological order is maintained, as incoming orders are processed by the market in the same succession as in which they arrive.

Our data reveal several characteristics of limit orders, order book dynamics and trades. However, the data only shows the net effect of a trader’s action in terms of simple limit orders and trades. Exotic orders appear in a rich variety, making it impossible to fully reconstruct them from the order flow
Services/Trading/OrderTypesG.pdf ()
. The only exception are market orders. These will be reconstructed from successive trade events due to their high relevance for our research goal. This can of course only be done approximately. In our reconstruction, we can not distinguish real market orders from effective market orders, i.e. limit orders crossing the spread. The data was tested to be free of internal contradictions, as also reported in other works (); ().

Iii Results

In Sec. III.1 we perform the reconstruction of market orders, as one market order can trigger many consecutive trades, and this information is not provided in the data. Further we neglect market orders which are solely traded against hidden limit orders. The share of market orders among all orders is calculated separately for different stocks, as well as similar measures characterizing limit orders. In Sec. III.2 we discuss statistical regularities of in-spread limit orders and use them to perform a clustering of stocks into groups with similar behavior. We test in how far the stocks within the same group also show similar properties according to order counts per day. In Sec. III.3 we see that the grouping of stocks according to in-spread limit orders is also relevant for off-spread limit orders. Finally, in Sec. III.4, we analyze the impact of single market orders, again considering the grouping found for in-spread limit order placements.

iii.1 Market Order Reconstruction and General Statistics

Our data do not explicitly include market orders, thus we have to reconstruct them from the trade events. Our reconstruction modifies the approach of Hautsch and Huang (2011). We trace subsequent trades to a single market order if all limit orders possibly traded against this market order are of the same type (for example sell). For trades against hidden limit orders, the type is unknown, so we include such trades in the market order reconstruction without discriminating sell or buy type. Apart from that we require that events of a different type do not come in between the trades, as for example a limit order arrival or cancellation, and that partially traded limit orders can only be the last event. Finally we require that all trade events take place within a time window of fixed length. According to the market specification (), the order “execution time is less than one millisecond.” This statement suggests that a consolidation of trade events should be only taken into account as long as the time stamp difference between the first and the last trade of a market order is at most one millisecond. We call the number of trades triggered by one market order “cluster size”. This terminology (“cluster” denoting a group of trades) discussed here has nothing to do with the clustering discussed in the upcoming chapters.

Figure 1: Distribution of the cluster size (number of trades consolidated to one market order) for 20160307_AAPL

Figure 1 shows the distribution of these cluster sizes for 20160307_AAPL. We find that of all market orders result in only one single trade. However, there are also groups of up to more than 80 trades against one market order. We checked the robustness of our reconstruction with respect to the maximum time difference allowed between the first and the last trade of a market order. There are only minor changes in the number of market orders, even if we increase the maximum time window from one millisecond to half a second. The respective changes in the distribution of cluster sizes are barely visible compared to the distribution in Fig. 1. For that reason the distributions for other maximum time windows are not shown here.

Moreover, for every group of trade events possibly considered due to one market order, between one and market orders might have triggered this series of events. This problem can not be unambiguously solved with help of our data. However, as we will see in the sequel, given that the number of potential market orders is much smaller than the number of limit order insertions and deletions, we may assume that two market orders arriving subsequently without being interrupted by a limit order placement or deletion is a very rare incident. Thus, if allowed by the other constraints, we cluster all multiple trade events to one market order.

There are market orders that can not be assigned to either buy or sell type. This happens for trades executed exclusively against hidden orders. In the upcoming study, we have to dismiss these market orders. We calculate the frequency of undirected market orders among all market orders for the order flow of one stock on one day, and repeat this for all 96 stocks on all five days, resulting in a number of 480 relative frequencies. We find that the proportion of undirected market orders among all market orders is on average around .

Figure 2: Box-Whisker-Plots for the relative frequencies of undirected market orders among all market orders (MOU), market orders among all orders (MO), active limit orders among all limit orders (ACT) and in-spread limit orders among all limit orders (SPRD).

The scattering of this quantity over the different data sets is presented with a Box-Whisker plot in Fig. 2 on the left (MOU). The white horizontal line in the gray box indicates the above mentioned average of . The gray box indicates both centered quartiles, so we see that half of the data is concentrated within the range to . The antennas reach up to the largest (smallest) data point within 1.5 times the interquartile range added (subtracted) to the upper (lower) quartile. The additional dots outside of the antennas correspond to the outliers: There are a few data sets in which a certain stock had a particularly high frequency of undirected market orders on a certain day. Altogether we find that for most of the data sets, meaningful information can be extracted from the market orders. The rejected undirected market orders only constitute a small fraction of all market orders.

In the second Box-Whisker plot labeled with MO, we see that the relative frequency of market orders among all orders is rather small with an average of about . Here, the regularity along our data sets is much stronger than it was for the relative amount of undirected market orders (MOU). We checked that the volume stored in the order book does neither systematically increase nor decrease throughout the day. Thus, either order volumes or limit order deletions have to compensate that the number of orders providing liquidity, i.e. limit orders, is much higher than the number of orders taking liquidity, i.e. market orders. We find that most limit orders are deleted. On average, only of all limit orders per data set are (at least partially) traded, cf. the Box-Whisker plot labeled as ACT. This means that the majority of limit orders is deleted without participating in the trading process. Trading is obviously not the predominant factor that clears orders from the order book. This is also known from earlier studies Gould et al. (2013). The relative frequency of orders that end up being deleted among all incoming orders is one of the quantities that clearly reflect non-stationarity of markets, see for example Bouchaud et al. (2002), where only of all limit orders are reported to be deleted (without at least partial execution) on the Paris stock exchange in 2002.

Finally, we are interested in the relative frequency of in-spread limit orders among all limit orders. The mean relative frequency of in-spread limit orders is . The Box-Whiskers plot labeled SPRD shows that this relative frequency varies stronger within the data sets as compared to those frequencies discussed before. The most obvious reason to expect differences for this quantity among different data sets is the fact that the spread does not always allow in-spread limit orders. We will return to that point in Sec. III.2.

iii.2 In-spread Limit Orders

(a) AAPL
(b) CERN
(c) GOOG
(d) AAPL
(e) CERN
(f) GOOG
Figure 3: Distributions of relative prices of in-spread limit orders (abc) and spread sizes prior to in-spread limit orders (def) for three data sets in different clusters. We choose to work with the data sets AAPL ((ad), cluster C1), CERN ((be), cluster C2) and GOOG ((cf), cluster C4) on the day 20160307.

We analyze the aggressiveness of in-spread limit orders, i.e. how close they are placed to the opposing quote. Hence, we evaluate the distribution of the relative price

where denotes the price of the quote on the same side and denotes the size of the spread, both at the time where the limit order is placed. For sell and buy limit orders, small implies a small aggressiveness of the order, moving the quote only slightly. We calculate for each of the 96 stocks the in-spread relative price distributions over all five days. Examples of such distributions are depicted in Fig. 3(ace) for three stocks. The distribution for Apple (AAPL) has a dominant sharp peak at . As we will see below, this is because the spread mostly only opens for one tick, and this one tick is then the only possible choice for placing an in-spread limit order. The relative price distributions are very diverse for different stocks.

To find groups of stocks with similar behavior, we compare relative price distributions for pairs of stocks. A robust way to deal with mixed distributions containing sharp peaks is to use the cumulative distribution. Calculating the maximum over all possible values of the difference between two cumulative distributions, we obtain the Kolmogorov-Smirnov (KS) distance measure. As a result of the stationarity of these distributions over the five days of our observations, we can use the average distribution for each stock without loss of information.

Figure 4: Kolmogorov-Smirnov distance matrix of in-spread order relative price distributions of our 96 stocks. The stocks were rearranged so that the clustering structure is visible, with clusters denoted C1, C2, C3 and C4.

The KS distance is shown in Fig. 4 for all pairs of stocks. The ordering of stocks is carried out with a clustering of stocks using the relational -means algorithm Szalkai (2013), with the KS distances as an input. The relational -means algorithm partitions the stocks into clusters, such that distances between stocks in the same cluster are small Szalkai (2013), and therefore stocks in the same cluster are similar. As a consequence, a pair of stocks with a large KS distance is most likely separated into different clusters. With help of the mean silhouette information criterion Rousseeuw (1987), we find that a number of four clusters is most appropriate to work with. The sizes of the clusters are as follows: of the stocks belong to cluster C1, , and of the data sets belong to the clusters C2, C3 and C4, respectively.

Stocks in the same cluster have similar distributions of relative prices. In fig. 3 we show distributions for representative stocks out of three of the clusters. As a result of the condition of times immediately before a limit order is placed in-spread, the minimum spread size observed is two ticks, although the spread size may equal only one tick for most of the time per day. As discussed above, the distribution shown in Fig. 3(a) is essentially a sharp peak at . In Fig. 3(d) we see that the spread is mostly only two ticks wide, before a new limit order arrives inside the spread. We adjusted the bin size to the tick size of one cent. By multiplying the density with the bin width we find that the probability for having a spread of two ticks is 97%. In this case, the relative price can only be . Stocks in our first cluster typically have a small spread. Such stocks have also been described as large tick stocks Eisler et al. (2012); Gareche et al. (2013); Dayri and Rosenbaum (2015), because the ticks are large compared to the spread size which liquidity providers are willing to offer. Instead, there is a large queue of limit orders waiting on the quotes for being executed, and the order of execution (according to arrival times) is important for investment strategies Gareche et al. (2013). To which extent this reflects in the order arrival outside of the spread in the different clusters found here will be analyzed in Sec. III.3.

Cluster C4 also has a clear interpretation, see the distribution of relative prices in Fig. 3(c). It has a maximum at the relative price which corresponds to the second least aggressive price. The density is decreasing towards higher relative prices. The bin size is chosen bigger than the discretization. The corresponding distribution of spread sizes Fig. 3(f) shows that the spread is usually so large that relative prices are almost not constrained by the spread size. Order placements with small aggressiveness are favored. A reason could be that traders want to have execution priority for their limit orders, but they want to stay close to the current quotes, because undercutting favors the liquidity taker. The behavior of stocks in the fourth cluster is closer to the market situation of older market studies Doyne Farmer et al. (2004); Mike and Farmer (2008). Interestingly, both types of market dynamics exist at the same time in the same market in 2016. The small tick stocks only comprise about of all stocks for the data we analyze.

Clusters C2 and C3 can be understood as originating from intermediate and varying spread sizes. The spreads of stocks in the intermediate clusters are located between the small spreads of stocks in cluster C1 with strong discretization effects, and the quasi continuous spreads of stocks in cluster C4. We consider cluster C2 as an example. The data set for CERN considered here is part of cluster C2, see Fig. 3(b). If the spread is only two ticks, the relative prices are . However, sometimes the spread is larger, allowing for other relative prices. The fact that we mostly observe relative prices smaller or equal to shows that generally less aggressive limit order placement is favoured over a more aggressive one.

In the literature, the distribution of relative prices in the spread is fitted by some analytical probability density Mike and Farmer (2008). Although distribution fitting might work well for the cluster C4, the distributions for the other three clusters might not be well described that way. Eventually, not only trader behavior but also conditions set by the spread size determine the distributions of relative prices, which makes a unifying characterization difficult.

Figure 5: Scatter plot of daily order counts for each data set vs. the relative frequency of market order among all orders. The colors and symbols indicate the affiliation of a stock to one of the four clusters C1 (blue circles), C2 (yellow squares), C3 (green diamonds) or C4 (red triangles).

Before we turn to the off-spread limit orders, we investigate to which extent the different characteristics relate to the overall market activity for a stock. Figure 5 shows the relative frequency of market orders among all orders plotted against the daily order counts for each data set in a scatter plot. The colors and symbols correspond to the different clusters. The following regularity emerges: In cluster C1 (blue circles), where the typical spread is small, total order counts tend to be high, while the relative frequency of market orders among all orders is rather small. This hints at a strong presence of algorithmic trading in these data sets, since the algorithms typically imply a high throughput of limit orders Harris (2003); Hendershott et al. (2011). In contrast, stocks in cluster C4 (red triangles) with large spreads come hand in hand with small order counts and comparably high relative frequencies of market orders. This might be due to a weaker presence of algorithmic traders.

iii.3 Off-spread Limit Orders

Figure 6: Scatter plot of relative frequencies buy and sell of limit orders placed on the quotes. The colors represent the cluster to which the relative price distribution of the respective data set was assigned. Symbols and color coding as in Fig. 5.
(a) AAPL
(b) CERN
(c) AMGN
(d) GOOG
Figure 7: Distributions of absolute deviations of in-spread limit orders. The data sets shown in LABEL:sub@fig:rp_sprd_hist_AAPL to LABEL:sub@fig:rp_sprd_hist_GOOG belong to the clusters C1 to C4.

We give an overview of the different characteristics of placements of off-spread limit orders by analyzing the relative frequency of limit orders being placed on the quotes among all off-spread limit orders for each stock. It is known and consistently reported for about at least 20 years that this is where most of the incoming limit orders arrive Biais et al. (1995); Gould et al. (2013). We compute the relative frequency for buy and sell orders on each of the 480 data sets separately, see Fig. 6. All data points scatter along the line of equality, indicating a buy-sell symmetry for this quantity. Although the clustering is done with respect to the in-spread limit orders, it is reflected in these relative frequencies for off-spread limit orders as well. Highest relative frequencies are seen for cluster C1 (blue circles). Thus especially when the spread is narrow, there is a high probability for limit orders to be placed on the quotes. For these stocks, the order flow dynamics is essentially confined to the price levels around the quotes. At small relative frequencies, the stocks of cluster C4 are found (red triangles). Apparently, when the spread is broad, limit orders are not so predominantly placed on the quotes. In Sec. III.4, we show how this relates to the market order impacts. Once more, the two intermediate clusters C2 (yellow squares) and C3 (green diamonds) contain those stocks for which the relative frequencies are on an intermediate level.

Since we are also interested in the distributions where limit orders are placed with respect to the quotes, we have to choose the right observable. We employ absolute deviations , but relative deviations, obtained by additionally dividing by the quote price, may be used as well. In the literature, sometimes absolute deviations Bouchaud et al. (2002) and sometimes relative deviations Mike and Farmer (2008) are considered. Both observables have their advantages and disadvantages. Especially close to the quotes, a comparison between different data sets is most suited in terms of absolute deviations . This is so, because the tick size is constant for all data sets under investigation. On the other hand, the (pseudo)discretization for relative deviations is different for each stock as a result of division by the quote prices, which vary by two orders of magnitude along our data sets and are not constant themselves within one data set, either. As our previous results suggest that limit orders are placed dominantly on and presumably close to the quotes, we prefer to use .

Figure 7 displays the distributions of absolute price deviations for four data sets. Three of those are the same ones for which we showed the relative prices of in-spread limit orders. We primarily discuss differences between AAPL and GOOG. The role of the first bin was already discussed. It is equal to (up to a normalization factor of 100) the relative frequency for limit orders being placed on the quotes, roughly for AAPL and decreases for higher cluster numbers. For GOOG of cluster C4 we have a relative frequency of around . The level right next to the quotes contains roughly of all limit orders for AAPL and around of all limit orders for GOOG. For AAPL, of all orders are found within the first three levels, while less than of all orders are found on these three levels for GOOG. For the latter, the distribution of absolute price deviations decreases very slowly. Hence, for the stocks of cluster C1, limit order arrival close to the quotes is dominant, while the depth of the order book appears to be more important for the stocks of cluster C4. Cluster C2 and C3 again feature a transition between the two extreme scenarios.

In previous studies, power-law behavior was reported quite consistently across different markets and times studied Potters and Bouchaud (2003); Gould et al. (2013). This could still be the case for stocks with large spread, especially in cluster C4. Fitting a power law to our data does not provide convincing results, maybe due to the short time window of five days. For stocks with small spreads, especially in cluster C1, the quotes and their first few neighboring levels are so dominant, that discretization effects might be too dominant for a universal power law behavior. Including data of a longer time window is beyond the scope of this study. Therefore, the characterization of off-spread limit order prices with appropriate fitting functions is left for future work.

iii.4 Market Orders

Figure 8: Distributions of relative market order volumes for 20160307_AAPL

We proceed as in Doyne Farmer et al. (2004), where gaps in the order book were found to generate high midpoint returns. The mechanism is as follows: The shift of the midpoint after the arrival of a market order depends on the orders removed from the order book by that market order. A price shift occurs if the market order volume is at least as big as the volume on the corresponding quote. As market order volumes are usually chosen in a way such that only the volume on the quotes is executed, the price shift, and therefore the return, depends on the distance between the quote and the next occupied level behind the quote. We call the distance between occupied price levels in the order book gaps. The larger the gap behind the quotes, the larger the impact of a market order. The correspondence between the gap structure and the returns was not only found empirically, but was also reproduced by an agent based model Schmitt et al. (2012). We will test how these mechanisms apply to our data.

We define the impact of market orders as


where is the midpoint price immediately prior to the market order, and is the midpoint price immediately after the market order. The authors of Doyne Farmer et al. (2004) studied data from the London Stock Exchange between 1999 and 2002 and found the volumes of market orders and volumes on the corresponding quotes to be correlated with a correlation coefficient of . However, this is only true if only those market orders are taken into account that change the midpoint. Otherwise, the correlation coefficient drops below . This is due to a majority of market orders having a volume below the quote volume.

(a) AAPL
(b) CERN
(c) AMGN
(d) GOOG
Figure 9: Distributions of returns generated by individual market orders for four stocks. The data sets shown in (a) to (d) belong to the clusters C1 to C4.

We find an average correlation per data set of when taking into account all market orders, regardless of having a non-zero impact or not. We consider that a consequence resulting from a higher amount of market orders being adjusted to have a volume equal to the quote volume, in contrast to Doyne Farmer et al. (2004). To understand that correlation in detail, we analyze the distribution of the relative volumes of market orders. These are defined as the market orders volumes divided by the quote volume from which liquidity is removed as the market order is executed. Thus, a relative volume smaller than one corresponds to a market order that does not change the midpoint, a relative volume equal to one corresponds to a market order that causes a return determined by the gap behind the quotes and a relative volume greater than one corresponds to an impact greater or equal to the gap behind the quote.

Figure 8 shows such a distribution for the data set 20160307_AAPL. The market order volumes match the quote volume in roughly of all cases. Relative volumes smaller than one have a probability of . Hence, relative volumes larger than one have a probability of . These results do not change considerably for other stocks in our data. This has two important implications for returns created by single market orders: First, we expect a high probability of around of market orders to result in zero return. Second, we note that most of the non-zero returns indeed follow from the gap behind the quotes. The strong correspondence between quote volumes and market order volumes indicates that our market order reconstruction works quite accurately.

Figure 9 shows the distributions of empirical returns according to Eq. (1) for our data sets. First, we observe a pseudo-discretization for all four data sets. The mere discretization is an effect of the tick size of . The pseudo-discretization is due to the changing midpoint, by which we divide when computing the returns. The authors of Münnix et al. (2010) deal with this phenomenon in great detail. We choose our bin size such that the discretization is not blurred. The number of return levels that show up reflect the sparseness of the orderbook. We again find qualitative differences with respect to the clusters we identified in Sec. III.2. For AAPL, the order book is densely filled around the quotes, thus preventing large price jumps of more than a few ticks. Apparently, the more sparse price level occupation, we hypothesized to be present for stocks in cluster C4 like GOOG, results in a broader return distribution. This distribution is so broad that tick size effects play a smaller role. Qualitatively, this fits to the results in Doyne Farmer et al. (2004). Due to the dense level occupation, the individual impacts tend to shrink as we move to smaller cluster numbers.

Iv Summary and Outlook

We analyzed order flow data of NASDAQ stocks in early 2016 and found remarkable qualitative differences in the order flow between different stocks. We focused on where orders are placed in the order book, with three qualitatively different placement positions for orders: in-spread, off-spread or crossing the spread. The latter we referred to as effective market orders. We found that limit orders were submitted much more frequently than market orders, and off-spread limit orders were much more frequent than in-spread limit orders. However, most limit orders were deleted rather than traded. For some stocks the small relative frequencies of in-spread limit orders resulted because the spread usually was equal to the tick size, out-ruling in-spread limit orders. Further we found that when the spread tended to be narrow, off-spread limit orders tended to be placed close to or on the quotes. When spreads tended to be wider, less off-spread limit orders arrived near the quotes. Market order volumes were typically adjusted to the quote volume. Thus, the impact of individual market orders depended on the orderbook structure, i.e. the impact of a single market order resulted from the gap behind the spread. When there was a high occupation density, individual market order impacts were typically reduced to the tick size (if there was any impact at all), while when the occupation density was small, impacts tended to be higher.

With help of the relational -means clustering algorithm and based on the distributions of relative prices of in-spread limit orders, we were able to grasp the diversity of order flow for different stocks. The diversity of the relative price distributions of in-spread limit orders could be traced back to the conditions set up by the spread. The different observables we took into account portrayed a consistent picture of the diversity: There were stocks with a high activity in terms of order arrival, where the competition of strategies led to a narrow spread and densely occupied adjacent levels in the orderbook. Therefore, the impact of individual orders was often constrained to the tick size. At the other end of the spectrum, spreads were typically large, order placement was not that strongly limited to the levels on and right next to the quotes, and individual impacts of market orders varied more strongly, as a result of a sparse occupation of price levels next to the quotes.

Future models, such as agent based models, have to account for that diversity. It may be doubted that one model with a fixed set of mechanisms might be able to reproduce stylized facts for the different kind of dynamics we observed. Finally, the diversity within a fixed time window and within one market suggests that averages over a whole market should be considered carefully.


  1. J Doyne Farmer and Duncan Foley, “The economy needs agent-based modelling,” Nature 460, 685–686 (2009).
  2. Eun Jung Lee, Kyong Shik Eom,  and Kyung Suh Park, “Microstructure-based manipulation: Strategic behavior and performance of spoofing traders,” Journal of Financial Markets 16, 227–252 (2013).
  3. Andrei A Kirilenko, Albert S Kyle, Mehrdad Samadi,  and Tugkan Tuzun, “The flash crash: The impact of high frequency trading on an electronic market,” Available at SSRN 1686004  (2015).
  4. Jonathan Brogaard, “High frequency trading and its impact on market quality,” Northwestern University Kellogg School of Management Working Paper 66 (2010).
  5. Felix Patzelt and Klaus Pawelzik, “An inherent instability of efficient markets,” Scientific reports 3 (2013).
  6. Frederik Meudt, Thilo A Schmitt, Rudi Schäfer,  and Thomas Guhr, “Equilibrium pricing in an order book environment: Case study for a spin model,” Physica A: Statistical Mechanics and its Applications 453, 228–235 (2016).
  7. Sebastian M Krause and Stefan Bornholdt, “Spin models as microfoundation of macroscopic market models,” Physica A: Statistical Mechanics and its Applications 392, 4048–4054 (2013).
  8. Rama Cont, “Empirical properties of asset returns: stylized facts and statistical issues,”  (2001).
  9. Martin D Gould, Mason A Porter, Stacy Williams, Mark McDonald, Daniel J Fenn,  and Sam D Howison, “Limit order books,” Quantitative Finance 13, 1709–1742 (2013).
  10. Chakrabarti, Anindya S. and Lahkar, Ratul, “Absence of economic and social constants,” Eur. Phys. J. Special Topics 225 (2016), 10.1140/epjst/e2016-60176-3.
  11. Marc Potters and Jean-Philippe Bouchaud, “More statistical properties of order books and price impact,” Physica A: Statistical Mechanics and its Applications 324, 133–140 (2003).
  12. A Gareche, G Disdier, J Kockelkoren,  and J-P Bouchaud, “Fokker-planck description for the queue dynamics of large tick stocks,” Physical Review E 88, 032809 (2013).
  13. Shanshan Wang, Rudi Schäfer,  and Thomas Guhr, “Average cross-responses in correlated financial market,” arXiv preprint arXiv:1603.01586  (2016).
  14. Joel Hasbrouck and Duane J Seppi, “Common factors in prices, order flows, and liquidity,” Journal of financial Economics 59, 383–411 (2001).
  15. Alex Boulatov, Terrence Hendershott,  and Dmitry Livdan, “Informed trading and portfolio returns,” The Review of Economic Studies 80, 35–72 (2013).
  16. Paolo Pasquariello and Clara Vega, “Strategic cross-trading in the us stock market,” Review of Finance , rft055 (2013).
  17. Tarun Chordia, Richard Roll,  and Avanidhar Subrahmanyam, “Commonality in liquidity,” Journal of financial economics 56, 3–28 (2000).
  18. Yuriy Stepanov, Philip Rinn, Thomas Guhr, Joachim Peinke,  and Rudi Schäfer, “Stability and hierarchy of quasi-stationary states: financial markets as an example,” Journal of Statistical Mechanics: Theory and Experiment 2015, P08011 (2015).
  19. 100, Accessed: September 28, 2016.
  20., Accessed: September 14, 2016.
  21. Ruihong Huang and Tomas Polak, “Lobster: Limit order book reconstruction system,” Available at SSRN 1977207  (2011).
    Services/Trading/OrderTypesG.pdf, Accessed: November 21, 2016.
  23., Accessed: September 13, 2016.
  24., Accessed: September 13, 2016.
  25. Nikolaus Hautsch and Ruihong Huang, “Limit order flow, market impact and optimal order sizes: evidence from NASDAQ TotalView-ITCH data,” Market Impact and Optimal Order Sizes: Evidence from NASDAQ TotalView-ITCH Data (August 22, 2011)  (2011).
  26., Accessed: August 11, 2016.
  27. Jean-Philippe Bouchaud, Marc Mézard, Marc Potters, et al., “Statistical properties of stock order books: empirical results and models,” Quantitative finance 2, 251–256 (2002).
  28. Balázs Szalkai, “An implementation of the relational k-means algorithm,” arXiv preprint arXiv:1304.6899  (2013).
  29. Peter J Rousseeuw, “Silhouettes: a graphical aid to the interpretation and validation of cluster analysis,” Journal of computational and applied mathematics 20, 53–65 (1987).
  30. Zoltan Eisler, Jean-Philippe Bouchaud,  and Julien Kockelkoren, “The price impact of order book events: market orders, limit orders and cancellations,” Quantitative Finance 12, 1395–1419 (2012).
  31. Khalil Dayri and Mathieu Rosenbaum, “Large tick assets: implicit spread and optimal tick size,” Market Microstructure and Liquidity 1, 1550003 (2015).
  32. J Doyne Farmer, Laszlo Gillemot, Fabrizio Lillo, Szabolcs Mike,  and Anindya Sen, “What really causes large price changes?” Quantitative finance 4, 383–397 (2004).
  33. Szabolcs Mike and J Doyne Farmer, “An empirical behavioral model of liquidity and volatility,” Journal of Economic Dynamics and Control 32, 200–234 (2008).
  34. Larry Harris, Trading and exchanges: Market microstructure for practitioners (Oxford University Press, USA, 2003).
  35. Terrence Hendershott, Charles M Jones,  and Albert J Menkveld, “Does algorithmic trading improve liquidity?” The Journal of Finance 66, 1–33 (2011).
  36. Bruno Biais, Pierre Hillion,  and Chester Spatt, “An empirical analysis of the limit order book and the order flow in the paris bourse,” the Journal of Finance 50, 1655–1689 (1995).
  37. Thilo A Schmitt, Rudi Schäfer, Michael C Münnix,  and Thomas Guhr, “Microscopic understanding of heavy-tailed return distributions in an agent-based model,” EPL (Europhysics Letters) 100, 38005 (2012).
  38. Michael C Münnix, Rudi Schäfer,  and Thomas Guhr, “Impact of the tick-size on financial returns and correlations,” Physica A 389, 4828–4843 (2010).
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description