Modeling the coupled return-spread high frequency dynamics of large tick assets
Large tick assets, i.e. assets where one tick movement is a significant fraction of the price and bid-ask spread is almost always equal to one tick, display a dynamics in which price changes and spread are strongly coupled. We introduce a Markov-switching modeling approach for price change, where the latent Markov process is the transition between spreads. We then use a finite Markov mixture of logit regressions on past squared returns to describe the dependence of the probability of price changes. The model can thus be seen as a Double Chain Markov Model. We show that the model describes the shape of return distribution at different time aggregations, volatility clustering, and the anomalous decrease of kurtosis of returns. We calibrate our models on Nasdaq stocks and we show that this model reproduces remarkably well the statistical properties of real data.
Keywords: Large tick assets, bid-ask spread dynamics, returns-spread coupling, Double chain Markov model, Markov chain Montecarlo.
In financial markets, the price of an order cannot assume arbitrary values but it can be placed on a grid of values fixed by the exchange. The tick size is the smallest interval between two prices, i.e. the grid step, and it is measured in the currency of the asset. It is institutionally mandated and sets a limit on how finely prices may be specified. The grid is evenly spaced for a given asset, and the tick size depends on the price.
In the recent years there has been a growing interest toward the role of tick size in determining the statistical properties of returns, spread, limit order book, etc. [14, 13, 8, 23, 15, 20, 11, 22, 43]. The absolute tick size is not the best indicator for understanding and describing the high frequency dynamics of prices. Consider, for example, two highly liquid NASDAQ stocks, namely Apple (AAPL) and Microsoft (MSFT). For both stocks the tick size is one cent. However, in the period we investigated in this paper (July and August 2009), the average price of AAPL was 157$ while the average price of MSFT was 24$. Thus a one cent price movement for AAPL corresponds to 0.6 bp, while for MSFT it is 4.2 bp. Therefore we can expect that the high frequency dynamics of AAPL will be significantly different from the one of MSFT. Recent literature has introduced the notion of an effective tick size to account and quantify the different behavior of returns and spread processes of assets for a given value of tick size. Qualitatively we say that an asset has a large tick size when the price is averse to variations of the order of a single tick and when the bid-ask spread is almost always equal to one tick. Conversely an asset is small tick size when the price is only weakly averse to variations of the order of a single tick and the bid-ask spread can assume a wide range of values, e.g. from one to ten or more ticks [20, 22]. Several papers in empirical and theoretical market microstructure have emphasized that large and small tick size assets belong to different “classes” [19, 23, 24]. Order book models designed for small tick assets do not describe correctly the dynamics of large tick assets . Moreover the ultra high frequency statistical regularities of prices and of the order book are quite different in the two classes.
In this paper we are interested in modeling the dynamics of large tick assets at ultra high frequency and taking expliciteply into account the discreteness of prices. More specifically, we introduce a class of models describing the coupled dynamics of returns and spread for large tick assets in transaction time111Hereafter we define the transaction time as an integer counter of events defined by the execution of a market order. Note that if a market order is executed against several limit orders, our clock advances only by one unit.. In our models, returns are defined as mid-price changes222With a little abuse of language we use returns and mid-price changes interchangeably . and are measured in units of half tick, which is the minimum amount the mid-price can change. Therefore, these models are defined in a discrete state space [5, 10] and the time evolution is described in discrete time. Our purpose is to model price dynamics in order to reproduce statistical properties of mid-price dynamics at different time scales and stylized facts like volatility clustering. Notice that, rather than considering a non observable efficient price and describing the data as the effect of the round off error due to tick size, we directly model the observable quantities, such as spread and mid-price, by using a time series approach.
The motivation of our work comes from two interesting empirical observations. Let us consider first the unconditional distribution of mid price change at different time scales. In the left panel of Fig. 1 we show the histogram of mid-price change of MSFT at the finest time scale, i.e. between two transactions. It is clear that most of the times the price does not change, while sometimes it changes by one or two half ticks. When we aggregate the returns on a longer time scale, for example transactions (see right panel of Fig. 1), a non trivial distribution emerges, namely a distribution where odd values of returns are systematically less populated than even values. It is important to notice that if we assume that returns of individual trades are independent and identically distributed333For example, if we randomize our sample of tick by tick mid-price changes, we would never be able to reproduce an histogram like the one shown in the right panel of Fig. 1. In fact in this case the histogram would be, as expected, bell shaped.
The second observation concerns the properties of volatility of tick by tick returns. Figure 2 shows the autocorrelation function of squared returns of MSFT in transaction time. Square returns can be seen here as a simple proxy of volatility. First of all notice that the autocorrelation is negative for small lags. It then reaches a maximum around trades and then it decays very slowly to zero. We observe that between and more than trades, the decay of the autocorrelation function is well described by a power law function, , and the estimated exponent is similar to the one observed at lower frequency and by sampling returns in real time rather than transaction time444It is worth noticing that in general the round-off error severely reduces the correlation properties of a stochastic process, even if the Hurst exponent of a long memory process is preserved . Therefore the autocorrelation function shown in Fig. 2 is a strong underestimation of the tick by tick volatility clustering of the unobservable efficient price.. We conclude therefore that very persisitent volatility clustering and possibly long range volatility is observed also at tick by tick level.
The purpose of this paper is to develop a discrete time series model that is able to explain and reproduce simultaneously these two empirical observations, namely the change of the distribution of price changes at different time scales and the shape of the volatility autocorrelation.
As a modeling approach, we note that the observation of Fig. 1 suggests that the return process can be characterized by different regimes which are defined by some variable, observable or not, in the order-book dynamics. The key intuition behind our modeling approach is that for large tick assets the dynamics of mid-price and of spread are intimately related and that the process of returns is conditioned to the spread process. The conditioning rule describes the connection between the stochastic motion of mid-price and spread on the grid.
For large tick assets the spread typically assumes only few values. For example, for MSFT spread size is observed to be or ticks almost always. The discreteness of mid-price dynamics can be connected to the spread dynamics if we observe that, when the spread is constant in time, returns can assume only even values. Instead when the spread changes, returns can display only odd values. Figure 3 shows the mechanical relation between the two processes. The dynamics of returns is thus linked to dynamics of spread transitions. This relation leads us to design models in which the return process depends on the transition between two subsequent spread states, distinguishing the case in which the spread remains constant and the case when it changes. From a methodological point of view we obtain this by defining a variable of state that describes the spread transition. We use a Hidden Markov, or Markov Switching, Model [2, 9] for returns, in which the spread transition is described by a Markov chain that defines different regimes for the return process.
The Markov Switching approach is able to describe the change in shape of the distribution of price change (Fig. 1), but not the persistence of volatility. To this end, we propose a more sophisticated model by allowing the returns process to be an regressive process in which regressors are the past value of squared returns [1, 25, 26, 27]. We show how to calibrate the models on real data and we tested them on the large tick assets MSFT and CSCO, traded at NASDAQ market in the period July-August 2009. We show that the full model reproduces very well the empirical data.
The paper is organized as follows. In Section 2 we review the main applications of Markov-switching modeling in the econometrics field. In Section 3 we present our modeling approach. In Section 4 we present our data for the MSFT stock and we describe the observed stylized facts of price dynamics. In Section 5 we describe the calibration of the models on real data and we discuss how well the different models reproduce the stylized facts. Finally, in Section 6 we draw some conclusions and we discuss future works.
2 Review of Markov switching models in econometrics
Markov switching models (MS models) have become increasingly popular in econometric studies of industrial production, interest rates, stock prices and unemployment rates [9, 30]. They are also known as hidden Markov models (HMM) [2, 33, 34], used for example in speech recognition and DNA analysis. In these models the distribution that generates an observation depends on the states of an underlying and unobserved Markov process. They are flexible general purpose models for univariate and multivariate time series, especially for discrete-valued series, including categorical variables and series of counts . Markov switching models belong to a general class of mixture distributions . Econometricians’ initial interest in this class of distributions was based on their ability to flexibly approximate general classes of density functions and generate a wider range of values for the skewness and kurtosis than is obtainable by using a single distribution. Along these lines Granger and Orr  and Clark  considered time-independent mixtures of normal distributions as a means of modeling non-normally distributed data. These models, however, did not capture the time dependence in the conditional variance found in many economic time series, as evidenced by the vast literature on ARCH models that started with Engle . By allowing the mixing probabilities to display time dependence, Markov switching models can be seen as a natural generalization of the original time-independent mixture of normals model. Timmermann  has shown that the mixing property enables them to generate a wide range of coefficients of skewness, kurtosis and serial correlation even when based on a very small number of underlying states. Regime switches in economic time series can be parsimoniously represented by Markov switching models by letting the mean, variance, and possibly the dynamics of the series depend on the realization of a finite number of discrete states.
The basic MS model is:
where denotes the unobserved state indicator which follows an ergodic -state Markov process and is a zero-mean random variable which is i.i.d. over time . Another relevant model is the Markov switching autoregressive model (MSAR()) of order that allows for state-independent autoregressive dynamics:
It became popular in econometrics for analyzing economic time series such as the GDP data through the work of Hamilton . In its most general form the MSAR model allows that the autoregressive coefficients are also affected by :
There is a key difference with respect to ARCH models, which is another type of time-dependent mixture processes. While Markov switching models mix a finite number of states with different mean and volatility parameters based on an exogenous state process, ARCH models mix distributions with volatility parameters drawn from an infinite set of states driven by lagged innovations to the series.
We can make use of the above models when we want to model a continuous state random variable . In our case we want a model for a discrete variable, i.e. the observed integer price differences, in a microstructure market environment. Therefore the models for continuous variables presented above cannot be used in our problem. We propose to model the coupled dynamics of spreads and price differences in the setting defined by the Double Chain Markov Models (DCMM) [25, 26]. This is the natural extension of HMM models in order to allow the hidden Markov process to select one of a finite number of Markov chains to drive the observed process at each time point. If a time series can be decomposed into a finite mixture of Markov chains, then the DCMM can be applied to describe the switching process between these chains. In turn DCMM belongs to the family of Markov chains in random environments [28, 29].
In discrete time, DCCM describes the joint dynamics of two random variables: , whose state at time is unknown for an observer external to the process, and , which is observable. The model is described by the following elements:
A set of hidden states, .
A set of possible outputs, .
The probability distribution of the first hidden state, .
A transition matrix between hidden states, .
A set of transition matrices between successive outputs of given a particular state of , , .
There are three different estimation problems: the estimation of the probability of a sequence of observations given a model; the estimation of parameters given a sequence of observations; the estimation of the optimal sequence of hidden states given a model and a sequence of outputs.
Our data, i.e. limit order book data, instead allow us to see directly the process that defines the hidden Markov process, i.e. the spread process. In this way we can estimate directly the matrices and by a simple maximum likelihood approach without using the Expectation Maximization (EM) algorithm and the Viterbi algorithm, that are usually used when the hidden process is not observable [25, 26]. We use the stationary probability distribution for the process as initial probability distribution in order to perform our calculations and simulations. We use the DCMM model as a mathematical framework for spread and price differences processes without treating spread process as an hidden process.
Among the few financial applications of the DCMM model we mention Ref.s [35, 36]. In the former paper, authors studied the credit rating dynamics of a portfolio of financial companies, where the unobserved hidden process is the state of the broader economy. In Eisenkopf  instead the author considered a problem in which a credit rating process is influenced by unobserved hidden risk situations. To the best of our knowledge our paper is the first application of DCMM to the field of market microstructure and high frequency financial data.
3 Models for the coupled dynamics of spread and returns
In this section we present the models describing the process of returns at time scale , where we define the mid-price as and we choose to measure in units of half tick size. In our models, return process follows different time series processes conditioned on the dynamics of transitions of the spread . Hereafter we will use the notation . The spread variable is measured in units of tick size, so we have and . The time variable is the transaction time.
3.1 Markov-Switching models
Spread process. It is well known that spread process is autocorrelated in time [42, 4, 7, 18]. We model the spread as a stationary Markov(1)  process555We have tried other specifications of the spread process, such as for example a long memory process, but this does not change significantly our results.:
where are spread values. As mentioned, we limit the set of spread values to , because we want to describe the case of large tick assets. We also assume that the process is not affected by the return process . The spread process is described by the transition matrix:
where the normalization is given by . The vector of stationary probabilities is the eigenvector of relative to eigenvalue , which is
where denotes the transpose of the matrix . This vector represents the unconditional probabilities of , so with .
Starting from the process, it is useful to define a new stationary Markov(1) process that describes the stochastic dynamics of transitions between states and as
This process is characterized by a new transition matrix
in which the stationary vector is given by
A limiting case is when the spread process is described by a Bernoulli process. In this case we set . Although is an i.i.d. process, the spread transition process is a Markov process defined by:
In the general case, the process is defined by two parameters (which are reduced to in Bernoulli case) that we can estimate from spread data.
Mid-price process. We can now define a Markov-switching process for returns which is conditioned to the process , i.e. to the spread transitions. Returns are measured in half ticks and we limit the set of possible values to , as observed in our sample. The discreteness of the price grid imposes the mechanical constraints
The mapping between transitions and allowed values of the mid-price changes has been done by using the cases shown in Fig. 3. This assumption is grounded on the empirical observation that mid-price changes are extremely rare for large tick assets (see Section 4).
In the simplest model, we assume that the probability distribution of returns between two transactions depends only on the spread transition between them. We can therefore define the following conditional probabilities defining the process of returns:
Notice that we have assumed symmetric distributions for returns between positive and negative values and is the parameter vector that we can estimate from data. The parameter () describes the probability that mid-price changes when the spread remains constant at one (two) ticks.
The coupled model of spread and return described here will be termed the MS model. When we consider the special case of spread described by a Bernoulli process we will refer to it as the MS model.
Properties of price returns. Here we derive the moments and the autocorrelation functions and under the MS model. The quantity is useful to study the statistical efficency of price, while describes volatility clustering in transaction time.
We compute first the vectors of conditional first, second and fourth moments
where indicates the th component of the vector . We have , and . Then we compute unconditional moments by using the stationary vector as
In order to compute the linear autocorrelation function we need to compute , by using conditional independence of with respect to . We obtain:
where we define the matrix . The autocorrelation function of returns is given by:
in our specific case because symmetry leads to .
We also compute the autocorrelation function of squared returns which is equal to
where we define the matrix .
As expected, both correlation functions depends on powers of the transition probability matrix . For a Markov process, is diagonalizable and we can write , where:
In the limit case in which the spread is described by a Bernoulli process, the matrix is not diagonalizable but has all eigenvalues in , i.e. , and we can compute its Jordan canonical form . Thus we can rewrite the lag dependence as , where:
The structure of block diagonal matrix implies that and that is a constant function for .
Discussion. The qualitative comparison of real data and model shows that the MS model is able to reproduce the distribution of returns quite well. This can be seen by comparing Fig.1 with Fig.4. It is worth noting that, at least qualitatively, also the Bernoulli model MS is able to reproduce the underestimation of odd values of returns with respect to the even values, as observed in real data. Therefore it is the coupling of spread and return, rather than the memory properties of spread, which is responsible of the behavior of the aggregated return distribution of Fig. 1. It is also possible to show that the model has linearly uncorrelated returns, as observed in real data, at least for lags larger than few transactions.
However the model fails to describe the volatility clustering. In fact, we can prove that is an exponential function, , with , i.e. the model describes an exponentially decaying volatility clustering. As the data calibration shows (see Section 5 and Figure 5), the predicted behavior of under the MS model is much smaller than the one observed in real data. Therefore this model is unable to reproduce the volatility clustering as well as any long memory property. This observation motivates us to develop a model that, preserving the structure of the coupling between spread and returns discussed so far, is able to describe non exponential volatility clustering. This model is developed in the next section.
3.2 A double chain Markov model with logit regression
The Markov switching model is not able to explain the empirically observed correlation of squared returns shown in Fig. 2. Therefore in the second class of models we consider an autoregressive switching model for returns [17, 30] in order to study correlation of squared returns. The idea is to use logit regressions on past values of variables, i.e. returns and squared returns in order to reduce the number of parameters that one would have with an higher order Markov process. The model is thus defined by the following conditional probabilities :
where we define an informative -dimensional vector of regressors , made of the past returns and squared returns. Each parameter vector is composed by the scalar , the -dimensional vector which describes the regression on past values of squared returns, and the -dimensional vector which describes the regression on past returns.
In order to handle the discreteness of returns we make use of a logit regression. To this end we first convert the returns series in a binary series . When the spread remains constant between and (i.e. or ), we set
while when the spread changes, (i.e. or ) we set
Then by denoting by the conditional probability of having , the logit regression is
and we finally obtain the process for by:
These equations define the general DCMM() model. In the rest of the paper we will consider the case and for the sake of simplicity we will denote DCMM()=DCMM(). In our case the independent latent Markov process is represented by the transition process and the dependent Markov process is represented by the processes. The form of stochastic dependence is defined by the logit rules in Eq. (3.2).
For the sake of clarity, here we consider the case , while its extension to a general value for is considered in Appendix A. The definition of the process for , and , in the case of (DCMM(1)) is the following:
We have four possible transition matrices for , determined by the latent process :
where we have specified the temporal dependence in regressors only in the first column. The others two matrices have same definitions: and . In this way, assuming that the latent process has reached the stationary distribution defined by Eq. 7, we can define an overall Markov chain by the transition matrix that describes the process:
The matrix is defined by parameters: .
The probabilities of the process for , and , in the case of (DCMM(1)) is
which can be calculated from the knowledge of the matrix .
In particular, we have four possible transition matrices for , determined by the latent process :
We can define an overall Markov process for described by a transition matrix , assuming that the transition process has reached the stationary distribution:
The matrix is defined by parameters: , where . The function for the DCMM(1) process is the correlation of the Markov(1) process defined by . We solve the eigenvalue equation for relative to the eigenvalue in order to determine the stationary probability vector :
the entire spectrum is given by , where the last eigenvalue is:
If we define the vectors , and , where , and , the moments are given by:
Finally, we have the expression for in the case :
The generalization of the calculation of to any value of the order is reported in the appendix A.
In order to estimate the parameter vector we maximize the partial-loglikelihood,
where is the length of sample, and we assume that parameters and are known. Since the dynamics of spread transitions is independent from the past informative set, i.e. , we have
In the case of large tick assets, it is and we can use the approximation
For example for MSFT we have . With this approximation we estimate only the vector and the parameter of Eq.3.1, that are enough in order to define matrices . Moreover we can approximate
In this way we neglect the contribution of regressors (weighted by ) and make use of the simpler expression in Eq. 3.1 when . As before, this approximation holds if the weight of is negligible, i.e. , i.e. when there is a small number of spread transitions . This is the case when we have large tick assets, where we have almost always . In the case of MSFT asset for example we have .
We have performed the calculation of the autocorrelation of the squared returns for and the result is reported in Fig. 5. We have calibrated the parameters on the MSFT asset (see next Sections for details). We note that the MS and MS models underestimate very strongly . Note that for the MS model, calibrated on real data is very small but not zero as predicted by the theory. The DCMM(p) model, on the other hand, is able to fit very well up to lag . Remarkably the model captures very well also the negative correlation for very short lags. However this observation indicates that an higher order DCMM(p) model might be able to fit better the real data. In the next Sections we will show that this is indeed the case.
We have investigated two stocks, namely Microsoft (MSFT) and Cisco (CSCO), both traded at NASDAQ market in the period July-August 2009, corresponding to 42 trading days. Data contains time stamps corresponding to order executions, prices, size of trading volume and direction of trading. The time resolution is one millisecond. In this article we report mostly the results for MSFT asset, which are very similar to those for CSCO.
Non stationarities can be very important when investigating intraday financial data. For this reason and in order to restrict our empirical analysis to roughly stationary time intervals, we first compute the intensity of trading activity at time conditional to a specific value of mid-price change, i.e. . As we can see from Figure 6, the unconditional trading intensity is not stationary during the day . As usual, trading activity is very high at the beginning and at the end of the day. For this reason, we discard transaction data in the first and last six minutes of trading day. Moreover figure shows that the relative frequencies of the three values of returns change during the day, except for returns larger than two ticks that are very rare throughout the day. Most important, in the first part of the day, one tick or two tick returns are more frequent than zero returns, while after approximately the opposite is true. For this reason we split our times series in two subsamples. The first sample, corresponding to a period of high trading intensity, covers the time sets , where time is measured in hours. The second sample, corresponding to low trading intensity, covers the time set . Table 1 reports a summary statistics of the two subsamples.
|asset||activity||trades||mean (ticks/2)||(ticks/2)||ex. kurt|
We then analyze the empirical autocorrelation function of squared returns for these two series. As we can see from Fig. 7, for both time series display a significant positive and slowly decaying autocorrelation, which is a quantitative manifestation of volatility clustering. The series corresponding to low trading activity displays smaller, yet very persistent, volatility clustering.
5 Estimation of the models and comparison with real data
We have estimated the models described in Secs.3.1 and 3.2 and we have used Monte Carlo simulations to generate artificial time series calibrated on real data. The properties of these time series have been compared with those from real data.
More specifically we have considered three models: (i) the MS model, where spread is described by a Bernoulli process and there are no logit regressors; (ii) the MS model, where spread is a Markov(1) process and there are no logit regressors; (iii) the DCMM() model, where spread is a Markov(1) process and the set of logit regressors includes only the past values of squared returns. Notice therefore that in this last model we set . Finally, we have estimated the model separately for high and low activity regime.
5.1 Estimation of the models
From spread and returns data we computed the estimators of the parameters defined in Sec.3.1. They are given by
where is the number of times , is the length of the spread time series, is the number of times the value of spread is followed by the value , is the number of times returns are zero in the regime , and is the length of the subseries of returns in the same regime. For the last estimator we count only zero returns because we assumed that the returns are distributed symmetrically in the set . We have checked that this assumption represents a good approximation for our data sets. The estimated parameters for MSFT asset are shown in Table 2.
In order to estimate the DCMM() model we need to estimate the vector . For both regimes we use the approximated log-likelihood of Eq. 30 because we have for low volatility series and for high volatility . Thus we need to estimate only the vector by a standard generalized linear regression and we use an iterative reweighted least squares technique . In this way we generate the returns series in regime , instead for the other regimes the generator follows the rules in Eq. 3.1, i.e. we use the estimator . The order of model is fixed to in order to investigate the impact of past squared returns on the returns process. For simplicity we report here only the results from high activity time series.
We find and we report the first values of in Table 3. The estimates of are significantly positive for up to , with the exception of . Moreover they display a maximum for . We perform a power law fit on these parameters, , and we find a significant exponent . We hypothesize that this functional dependence of from could be connected to the slow decay of the autocorrelation function of squared returns, but we have not investigated further this aspect.