# Order statistics of horse racing and the randomly broken stick

## Abstract

We find a remarkable agreement between the statistics of a randomly divided interval and the observed statistical patterns and distributions found in horse racing betting markets. We compare the distribution of implied winning odds, the average true winning probabilities, the implied odds conditional on a win, and the average implied odds of the winning horse with the corresponding quantities from the “randomly broken stick problem”. We observe that the market is at least to some degree informationally efficient. From the mapping between exponential random variables and the statistics of the random division we conclude that horses’ true winning abilities are exponentially distributed.

: Department of Physics and Astronomy, University College London, London WC1E 6BT; : Trium Capital LLP, 60 Gresham St, London EC2V 7BB; : Financial Computing & Analytics, Department of Computer Science, University College London, London WC1E 6BT; : CFM–Imperial Institute of Quantitative Finance, Department of Mathematics, Imperial College, London SW7 2AZ

## 1 Introduction

From time to time nature has a taste for simplicity. It can then be promising to treat unknown variables as purely random, using statistics that are compatible with the constraints, symmetries, or boundary conditions of the given problem, but otherwise as simple as possible. Heavy nuclei are an example of such a system; they are seemingly hopelessly complex, yet the spacings between their energy levels follow well-known statistics of random matrix eigenvalues [1, 2]. More recently, one of such statistics, the Marchenko-Pastur distribution, has been found in fluctuations of financial covariance matrices [3], despite the strong non-Gaussian dependencies observed in real financial time series [4]. Latter example underscores the success of econophysics: Socio-economic human systems are highly non-linear [5, 6, 7, 8] and chaotic [9], but methods borrowed from statistical physics can still be successful in describing bulk statistics of these systems.

Traditionally, econophysics has somewhat neglected a certain type of financial markets: Betting markets. This is perhaps surprising because economists, on the contrary, have studied betting markets extensively, considering them as a controlled experiment for market efficiency [10, 11, 12], a key concept in financial economics. Because the outcome of the bet – win or lose – is definitely known after a certain time, it is straightforward to draw conclusions from the discrepancy between the implied market odds^{1}

The sum of all the implied odds must be roughly one. We know that this is not always exactly true, for example because bookmakers charge the gamblers a small fee, but for the purpose of this study these tiny deviations are not important. Each horse’s implied odds thus represent a segment of the unit interval. We do not know much about horses, but we can guess the simplest statistics for these segments: Draw numbers from the uniform distribution and cut the unit interval at each of these numbers. We thus break a stick of unit length randomly into pieces, each of which represents the winning odds of an individual horse, participating in a race with horses. The favourite’s odds correspond to the largest segment, the second favourite’s odds to the second largest segment, and so on.

This letter reports striking similarities between the empirical distribution of implied odds observed in horse racing betting markets and the order statistics of the random division of the unit interval. Moreover, we find that conditional expectations of the true winning probabilities^{2}

(1) |

in that follows the statistics of the broken stick problem if and only if the are exponentially distributed.

Consider the interval and divide it randomly into sub-intervals. The length of the -th largest sub-interval, which we denote here by , has the distribution [13]

(2) |

with . We want to compare to the empirical distribution of implied winning odds of horse racing betting markets.

We use data collected through Betfair on 12736 races occurring across the British Isles in the period from 31/12/2011 to 15/12/2012. The average number of horses per race is 8.95. We consider only races with at least horses which reduces the total number of races in our dataset to 11925. Gamblers exchange bets on horses in a limit order book. Sell orders match buy limit orders specified by volumes and lay decimal odds. Buy orders match sell limit orders specified by volumes and back decimal odds. Decimal odds quote the ratio of the payout amount, including the original stake, to the stake itself. The highest back quote is larger than the lowest lay quote^{3}

Consider now the implied odds of the -th favourite horse, which we denote by . Fig. 1 compares the ECCDF with the theoretical prediction for the favourite, 2nd favourite, 3rd favourite, 4th favourite, and the horse with the least implied winning odds (the “longshot”), averaged over the number of horses in each race. The agreement is striking and calls for further investigation.

To compare the true winning probabilities to the order statistics of the broken stick problem we need to calculate average quantities. Table 1 shows the expected length of the -th largest segment, the average empirical implied odds of the -th favourite, and the average observed true winning probabilities of the -th favourite, denoted by , for all races in our dataset and for three subgroups containing roughly an equal number of horses: races with horses, races with horses, and races with horses. The theoretical expectation of the segment lengths are calculated by taking the first moment of Eq. 2 (analytically given in Eq. 4 below) and averaging over the empirical distribution of .

favourite | 2nd favourite | 3rd favourite | 4th favourite | longshot | |
---|---|---|---|---|---|

0.3208 | 0.2001 | 0.1420 | 0.1037 | 0.0210 | |

0.3358 | 0.1976 | 0.1345 | 0.0998 | 0.0253 | |

0.3237 | 0.2046 | 0.1451 | 0.1054 | 0.0157 |

favourite | 2nd favourite | 3rd favourite | 4th favourite | longshot | |
---|---|---|---|---|---|

0.3996 | 0.2399 | 0.1578 | 0.1024 | 0.0336 | |

0.4165 | 0.2276 | 0.1503 | 0.0981 | 0.0339 | |

0.4081 | 0.2407 | 0.1570 | 0.1012 | 0.0285 |

favourite | 2nd favourite | 3rd favourite | 4th favourite | longshot | |
---|---|---|---|---|---|

0.3184 | 0.1985 | 0.1438 | 0.1078 | 0.0182 | |

0.3327 | 0.2081 | 0.1362 | 0.1031 | 0.0233 | |

0.3166 | 0.2041 | 0.1478 | 0.1103 | 0.0128 |

favourite | 2nd favourite | 3rd favourite | 4th favourite | longshot | |
---|---|---|---|---|---|

0.2470 | 0.1631 | 0.1247 | 0.1004 | 0.0119 | |

0.2614 | 0.1564 | 0.1172 | 0.0977 | 0.0193 | |

0.2500 | 0.1703 | 0.1305 | 0.1039 | 0.0065 |

Not only do the empirical implied odds correspond to the expected segment lengths but the average observed winning probabilities also follow the order statistics of the random division accurately for all horses, with the exception of the longshot. Note that our theoretical estimations of the winning odds based on the expected segment lengths are parameter free.

We observe significant discrepancies for the longshot, but the differences between its implied odds, winning probability and segment length are small for races with horses and larger for races with more horses. This suggests that gamblers are not able to rank the horses precisely enough when the number of horses is large. Remember that we define the rank of the horse according to the observed implied odds. Therefore, the smallest segment may describe a horse that the market has not recognised as the weakest one. In this case the market’s longshot is in reality a slightly stronger horse. This is consistent with the fact that both implied odds and winning probabilities of the longshot are larger than suggested by Eq. 2.

We also calculate the implied odds of the -th favourite given that this horse wins. This quantity is naturally larger than the unconditional implied odds of the -th favourite. To find the corresponding theoretical prediction consider the indicator function which is one if a random point in the interval lies in the -th largest segment and zero, else. Then:

and

(3) |

Eq. 3 is the theoretical prediction of the -th favourite’s odds given that this horse wins.
By using well-known binomial identities we find from Eq. 2 after a somewhat lengthy calculation^{4}

(4) |

with the partial harmonic number and

(5) |

Table 2 compares the implied odds of the -th favourite given that it wins with the average length of the -th largest segment given that it contains a random point, see Eq. 3.

favourite | 2nd favourite | 3rd favourite | 4th favourite | longshot | |
---|---|---|---|---|---|

0.3735 | 0.2148 | 0.1542 | 0.1139 | 0.0886 | |

0.3622 | 0.2196 | 0.1549 | 0.1145 | 0.0383 |

We observe again a good agreement between the empirical odds and the theoretical prediction (except for the longshot, see discussion above).

Finally, we calculate the average initial odds of the winning horse which is . Its theoretical prediction follows from Eq. 3,

(6) |

which – after averaging over – yields , again very close to the empirical value.

To summarise, we have found a remarkable agreement between the order statistics of the randomly broken stick and the statistical properties of horse racing betting markets. We also observe that the empirical values of the implied odds and true winning probabilities are close and therefore conclude that this betting market is informationally efficient at least to some degree. Some discrepancies are found for the longshot, because gamblers probably fail to rank the horses accurately when their number is big. Assuming that the implied odds reflect to a large extent the true winning probabilities, we conclude that the “ability” of a horse can be defined in such a way that its winning probability is the ratio of its “ability” to the sum of all its competitors’ abilities, provided “ability” is exponentially distributed.

Acknowledgements: Julius Bonart thanks Jean-Philippe Bouchaud, Jonathan Donier and Tomaso Aste for interesting discussions. We would also like to give warm thanks to Peter A. Bebbington’s PhD supervisors I. J. Ford and F. M. C. Witte, the funding body EPSRC and the Centre for Doctoral Training in Financial Computing & Analytics.

### Footnotes

- We use here “implied odds” in the sense of “implied probability”.
- Of course we cannot observe an empirical distribution of true winning probabilities but only aggregate statistics, such as, for example, the average winning probability of the favourite horse.
- Note that here buy orders match at lower quotes than sell orders.
- The identity for is reported in [14], p. 153, but the authors have no knowledge of a previous appearance of Eq. 5.

### References

- E. P. Wigner. Characteristic vectors of bordered matrices with infinite dimensions. Annals of Mathematics, 62:548–564, 1955.
- T. A. Brody, J. Flores, J. B. French, P. A. Mello, A. Pandey, and S. S. M. Wong. Random-matrix physics: spectrum and strength fluctuations. Review of Modern Physics, 53:385–480, 1981.
- L. Laloux, P. Cizeau, J.-P. Bouchaud, and M. Potters. Noise dressing of financial correlation matrices. Physical Review Letters, 83:1467–1470, 1999.
- J-P Bouchaud and M Potters. Theory of financial risk and derivative pricing. Cambridge, 2009.
- B. Tóth, Y. Lempérière, C. Deremble, J. De Lataillade, J. Kockelkoren, and J. P. Bouchaud. Anomalous price impact and the critical nature of liquidity in financial markets. Physical Review X, 1(2):021006, 2011.
- J. Donier, J. Bonart, I. Mastromatteo, and J.-P. Bouchaud. A fully consistent, minimal model for non-linear market impact. Quantitative Finance, 15:1109–1121, 2015.
- J. Donier and J. Bonart. A million metaorder analysis of market impact on the bitcoin. http://papers.ssrn.com/sol3/Papers.cfm?abstract_id=2536001, 2014.
- Tiziana di Matteo. Multi-scaling in finance. Quantitative Finance, 7:21–36, 2005.
- F. Patzelt and K. Pawelzik. An inherent instability of efficient markets. Scientific Reports, 3:2784, 2013.
- L. V. Williams. Information efficiency in betting markets: A survey. Bulletin of Economic Research, 51:1–39, 1999.
- S. Figlewski. Subjective information and market effciency in a betting market. The Journal of Political Economy, 87:75–88, 1979.
- P. Divos, S. sel Bano Rollin, Z. Bihary, and T. Aste. Risk-neutral pricing and hedging of in-play football bets. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2598767, 2014.
- L. Holst. On the length of the pieces of a stick broken at random. Journal of Applied Probability, 17:623–634, 1980.
- H. A. David and H. N. Nagaraja. Order Statistics. John Wiley & Sons, Inc., third edition edition, 2003.