Emergence of scale invariance and efficiency in a racetrack betting market
We study the time change of the relation between the rank of a racehorse in the Japan Racing Association and the result of victory or defeat. Horses are ranked according to the win bet fractions. As the vote progresses, the racehorses are mixed on the win bet fraction axis. We see the emergence of a scale invariant relation between the cumulative distribution function of the winning horse and that of the losing horse . holds in the small win bet fraction region. We also see the efficiency of the market as the vote proceeds. However, the convergence to the efficient state is not monotonic. The time change of the distribution of a vote is complicated. Votes resume concentration on popular horses, after the distribution spreads to a certain extent. In order to explain scale invariance, we introduce a simple voting model. In a ‘double’ scaling limit, we show that the exact scale invariance relation holds over the entire range .
Racetrack betting is a simple exercise of gaining a profit or losing one’s wager. However, one needs to make a decision in the face of uncertainty and a closer inspection reveals great complexity and scope. The field has attracted many academics from a wide variety of disciplines and has become a subject of wider importance . Compared to the stock or currency exchange markets, racetrack betting is a short-lived and repeated market. It is possible to obtain starker views of aggregated better behaviour and study the market efficiency. One of the main findings of the previous studies is the ‘favorite-longshot bias’ in the racetrack betting market [2, 3]. Final odds are, on average, accurate measures of winning and short-odds horses are systematically undervalued and long-odds horses are systematically overvalued.
From an econophysical viewpoint, racetrack betting is an interesting subject. Park and Dommany have done an analysis of the distribution of final odds (dividends) of the races organized by the Korean Racing Association . They found power law behaviour in the distribution and proposed a simple betting model. Ichinomiya also found the power law in the races of the Japan Racing Association (JRA)  and proposed another betting model. We have studied the relation between the rank of a racehorse in JRA and the result of victory or defeat . Horses are ranked according to the win bet fractions. We studied the distribution of the winning horses in the long-odds region . Between the cumulative distribution function of the winning horses and that of the losing horses , we find a scale invariant relation with . We show that in a ‘Pólya’ like betting model, where betters vote on the horses according to the probabilities that are proportional to the votes, such a scale invariance emerges in a self-organized fashion. Furthermore, the exact scale invariant relation holds exactly over the entire range in a limit. We also studied a voting model with two kinds of voters, independent and copycat . We find that a phase transition occurs in the process of information cascade and it causes a critical slowing down in the convergence of the decision making of crowds.
In this paper, we study the time series data of the vote¡¡in JRA races in detail. We show that as the vote progresses, scale invariance and market efficiency do emerge in the rank of the racehorses. The decision making process in betting is not simple and some subtle mechanism does work. We explain the scale invariance based on a simple betting model.
2 Racetrack betting process
We study the win bet data of JRA races in 2008. A win bet requires one to name the winner of the race. There are 3542 races and we choose 3250 races whose final public win pool (total number of votes) is in the range . horses run in race and . We remove 133 cancelled horses and the total number of horses is . There are 3251 winning horses (one tie occurs) and are denoted as . Of course, the remaining 44022 horses are losers and are denoted as . The number of times race is announced is denoted as and is in the range . The total number of announcements is . The time to the race entry time (start of the race) in minutes is denoted as . We denote the odds of the th horse in race at the th announcement as and the public win pool as . denotes the results of the races. implies that the horse wins (loses). A typical sample from the data is shown in Table 1.
From , we estimate the win bet fraction by the following relation.
If the above does not sum up to one, we renormalize it as . Hereafter, we use in place of .
We use the average public win pool as a time variable of the whole betting process. For each , we choose the nearest and use the average value of as the time variable . More explicitly, we define as
We define and as
We denote the average value of as and it is defined as
Figure 1 shows the relation between and . We see a rapid growth of the average number of votes as we approach the start of the race (). Almost half of the votes are thrown in the last ten minutes. We choose five timings from the voting process and denote them as . corresponds to the first voting data in each race. and [min]. At , [min]. At , [min] and at , [min]. corresponds to the last voting data in each race. and [min]. All data are summarized after the start of race. The reason why we choose will become clear in section 4.
In order to see the betting process pictorially, we arrange the horses in the order of the size of . We denote the arranged win bet fraction as .
tells us whether horse wins () or loses (). In general, the probability that the horse with large wins is big and vice versa. We arrange the horses in the increasing in order of from left to right. On the left-hand side of the sequence, more strong horses exist. On the right-hand side, there are more weak horses. If the win bet fraction does not contain any information about the strength of the horses, is one and zero randomly. Conversely, if the information is perfectly correct, the first horses’ are 1 and the remaining horses’ are 0.
Figure 2 shows the time change of the ranking process of the horses. We choose 100 horses for each category and follow their ranking as the betting proceeds. We use as the time step. During one hundred steps, the average number of votes changes from for (bottom) to about 178116 for (top). corresponds to the first announcement and . At , the horses are arranged in the sequence at random. The win bet fractions do not contain much information about the strength of the horses. As the betting progresses, the phase separation between the two categories of the horses does occur. Winning (losing) horses move to the left (right) in general. In the end, to the left (right) are more winning (losing) horses. In the betting process, betters have succeeded in choosing the winning horses to some extent. We also note that there remain winning horses in the right. This means that we can find winning horses with very small win bet fractions.
3 Emergence of scale invariance
In order to see the existence of a winning horse with a very small win bet ration or very low rank, we calculate the cumulative functions of the winning horses and that of the losing horses counted from the lowest rank (right). More precisely, we study the following quantities.
The above are normalized so that is zero at and 1 at . The curve is known as a receiver operating characteristic (ROC) curve. We are interested in the limit . In particular, we study the scale invariant relation . If such a power law relation holds, we can find winning horses with any small win bet fraction.
Figure 3a shows the double logarithmic plot of the ROC curves . We show five curves for . At , there are 70 votes in the pool on average. The plot is almost a diagonal line from to , which means that the winning and losing horses are mixed well in the sequence. The curve is fitted with the power relation with . As the betting progresses, the curve becomes more downward convex shape and the slope increases. At , the degree of the phase separation between the two types of horses reaches its maximum. On the other hand, we also see that there exist winning horses in the very low rank region. Each step upwards of the curve implies the existence of a winning horse and the upward movement starts for a very small . In addition, we also see a straight line region in the curve in . We have fitted the curve as in the previous case and we get . This means that the scale invariant relation between and also holds after many rounds of betting.
4 Emergence of efficiency
The win bet fraction aggregates the wisdom of the betters in the racetrack betting market. In order to quantify the accuracy, we use two measures. The first one is the accuracy ratio (AR). AR measures how an event occurs in the order of a rank. Here, we consider the event that a horse wins in the race. Horses are arranged in the increasing order of the size of the win bet fraction . There are winning horses and losing horses. As we have explained before, if the prediction of the betters is good, the winning horses are concentrated in the higher ranks. If the prediction is perfect, the first horses are the winning ones and is one for them. In order to define the accuracy of the prediction or to measure the completeness of the rank, we introduce a Lorenz curve for as
Figure 4a depicts the Lorenz curves for . At , after about seventy rounds, the horses are arranged at random on the sequence . The Lorenz curve runs almost along a diagonal line. As the betting proceeds, the winning horses () move to the higher ranks and the degree of the upward convex nature of the curve increases. The preciseness of the prediction increases monotonically. At , the increase almost stops and the accuracy of the predictions reaches a maximum.
In order to quantify the accuracy of the predictions of the racetrack betters, we use accuracy ratio, AR . AR is defined as
AR measures how different is the ranking from the complete case. The denominator in the definition is the normalization factor that ensures that AR is one for the completely ordered case. If the ranking is perfect, and AR is 1.
We also introduce an expected Lorenz curve EL, which is defined as
. In order to quantify how the bets are concentrated or scattered among horse, we introduce expected AR and call it EAR. EAR is defined as
If we assume that the horses are divided into two groups, A and B. The horses’ win bet fraction in group A () is large and those of other horses in group B () is small, the votes are concentrated on the horses in group A. EAR is large. If the votes are scattered among all horses, EAR is small. In particular, if for all , EAR is zero. If and , EAR is one.
Figure 4b depicts EL for . At , EL rapidly increases to one, which implies that votes are concentrated on small number of horses. However, AR is small at and the horses are not the winning horses. After , the votes are scattered among many horses and the degree of the upward convex nature of the curve decreases up to . After the decline, it begins to increases. At , the degree of the concentration of votes reaches a maximum. is the boundary line. Before , votes are more and more distributed among many horses. After , the votes tend to be concentrated on popular horses.
By comparing the behaviour of L and EL, we are able to study the efficiency of the market. If the probability that the horse wins is , the two Lorenz curves L and EL do coincide with each other in the limit . Hence the equality is a necessary condition of the efficiency of the market. However, it is not a sufficient condition. Even if the equality holds, there is a possibility that the two curves depart from each other. We also note that if the strength of the horse at rank is overvalued, the inequality holds. On the contrary if the strength is undervalued, the inequality holds. Here and .
Figure 5 shows AR and EAR as the functions of . As the betting progresses, AR increases monotonically and it almost reaches its maximum at . Afterwards, the increase in AR is very slow and the following bets do not increase the accuracy of the prediction as to which horse wins the race. More interesting behaviour can be found in the time change of EAR. At , EAR is very large and is nearly 0.9. Almost all votes are concentrated on small number of horses. However, AR at is small and the horses with large bet fractions are not so strong. The true strong horses are scattered all over the rank of the win bet fraction. Afterwords, EAR decreases rapidly and at , AR and EAR coincide. The necessary condition of the market efficiency is satisfied at . Up to , EAR decreases and almost reaches its minimum. The bets are scattered among many horses and this implies the rich variety of the betters’ predictions as to which horse wins the race. This also means that the strong horses are undervalued and the weak horses are overvalued, that is the ‘favorite-longshot bias’ state, which can be seen more clearly below. The discrepancy between AR and EAR is the largest at after . After that, the discrepancy decreases monotonically as EAR increases faster than AR. The bets begin to be concentrated on more popular horses. At , AR and EAR coincide again. The necessary condition of the market efficiency is satisfied again. After , the degree of the concentration increases further, the discrepancy between AR and EAR is small even at .
Figure 6a shows the discrepancies between and at . On the y-axis, we show . As we have explained before, the sign of tells us whether the horse at rank is overestimated or underestimated. If is zero, the strength of the horse is properly estimated by the betters and the racetrack betting market is efficient. On the other hand, if is positive (negative), the strength is underestimated (oversetimated).
At , DL is close to the x-axis and we see that for small , . This means that the market is almost efficient, but top 10% popular horses are overestimated and next 10% horses are underestimated. Bets are more accumulated on the top 10% horses and their win bet fractions are larger than their true winning probabilities. On the other hand, there are less bets on next 10% horses as compared to their true winning probabilities. Remaining 80% horses’ strength are properly estimated, because DL is close to the x-axis. At , is positive for all . From the figure, we see that the popular 30% horses are underestimated as compared to their winning probabilities. Remaining 70% unpopular horses are overestimated. The bets are distributed among many horses, including many weak horses and an inefficient state is realized. Following this, the graph of approaches the x-axis at . The coincidence between the two Lorenz curves is better than at . For small (popular horses), and the strong horses are overestimated. For large (unpopular horses), DL is almost on the x-axis. From the small discrepancy, we see that the top 20% horses are overestimated. And next 30% horses are underestimated. However, the slope of DL in the two regions is small and the market is almost efficient. This efficient state remains to be true even at .
Figure 6b shows DL for the data of all JRA races (1986-2006). Contrary to the ‘favourite-longshot bias’, we see some complex behaviour. Top 0.4% horses are underestimated and next 10% horses are overestimated and so on. However, the degree of the inefficiency is very small.
5 Voting model and scale invariance
We consider a voting model for horses . Betters vote for them one by one, and the result of each voting is announced promptly. The time variable counts the number of the votes. The horses are classified into two categories , and we call them binary horses. There are horses in each category and .
We denote the number of votes of the th horse at time as . At , takes the initial value . If the th candidate gets a vote at , increases by one unit.
A better casts a vote for the candidates at a rate proportional to . The probability that the th candidate gets a vote at is
The problem of determining the probability of the th candidate getting votes up to is equivalent to the famous Pólya’s urn problem [10, 11]. The probability that the th candidate gets votes up to is given by the beta binomial distribution
is defined as .
After infinite counts of voting, i.e. , the share of votes becomes the beta distributed random variable beta on .
Next, we focus on the thermodynamic limit and . The expectation value of is . We introduce a variable . The distribution function in the thermodynamic limit is given as
The share of votes, , of a candidate follows a gamma distribution function with .
After many counts of voting, , the two types of horses are distributed in the space of according to the gamma distribution in the thermodynamic limit . If , a candidate belonging to category has a higher probability of getting many votes than a candidate belonging to category . Even the latter can obtain many votes. It is also possible that the former may get few votes. Thus, there is a mixing of the binary candidates.
The cumulative functions is given as
Using the incomplete gamma function of the first kind , it is given as
Near the end point, , in other words, in the small region, the incomplete gamma function behaves as
As , the following relation holds:
We see that a scale invariant behaviour appears in the mixing.
Furthermore, in the limit with fixed , next relation holds .
The scale-invariant relation holds over the entire range . This feature is remarkable from the viewpoint of statistical physics. Usually, the power-law relation holds only in the tail.
We discuss the limit in the derivation of the exact scale invariance. In the derivation of the gamma distribution, we take the thermodynamic limit . With the gamma distribution, holds in the limit . For (22) to hold, these two limits, and , should go together. approaches zero more slowly than approaches infinity. We call the limit and with fixed as the double scaling limit. If we take the limit without the limit , the firstly chosen candidate gets all the remaining votes and there is no mixing of the binary candidates. The double scaling limit is crucial to the emergence of the exact scale invariance.
6 Exact scale invariant gradation pattern
The voting problem reduces to a random ball removing problem with the relative probability in the double scaling limit . Figure 11 shows the gradation pattern made by the algorithm of the random ball removing problem. We prepare red balls and blue balls. From them, we take one ball at a time and do not return it. The probability that a red (blue) ball is chosen is proportional to . We repeat the procedure up to when there remains no ball. We get a sequence of balls. In the limit , the exact scale invariance between and holds. In the figure, we change the ratio from 1 (bottom) to 100 (top). Near the bottom, two types of balls are mixed. Near the top, phase separation occurs.
In this paper, we study the time series of win bet fraction of the 2008 JRA races. We see that the betting process induces scale invariance between the cumulative functions of the winning horses and that of the losing horses. We also see the emergence of the market’s efficiency after many rounds of betting. However, the convergence to the efficient state is not monotonic. The dynamics of the distribution of the votes among the horses is complex. At first, the votes accumulate to small number of horses and then they are distributed among many horses, including weak horses. At this time, the strong horses are underestimated. Afterwards, votes begin to be concentrated on more popular horses, but the ranking of the winning horses does not change so much. AR does not change much and only EAR increases, and finally holds.
With regard to the scale invariance, we explain the mechanism based on a simple voting model. The model shows exact scale invariance in the double scaling limit. In addition, the voting model in the limit is equivalent to a random ball removing problem. Using the equivalence, we show how to make an exact gradation pattern of mixed binary objects.
This research was partially supported by the Ministry of Education, Science, Sports and Culture, Grant-in-Aid for Challenging Exploratory Research, 21654054, 2009.
- D.B.Hausch, V.SY.Lo and T.Ziemba, Efficiency of Racetrack Betting Markets 2008 Edition, World Scientific, Singapole.
- R.M.Griffith, Amer.J.Psychol 62290(1949).
- W.T.Ziemba and D.B.Hausch, Betting at the Racetrack ,Dr Z Investment Inc, San Luis Obispo, CA(1987).
- K.Park and E.Dommany, Europhys.Lett. 53,419(2001).
- T.Ichinomiya, Physica A368,207(2006).
- S.Mori and M.Hisakado, Exact Scale Invariance in Mixing of Binary Candidates in Voting Model, preprint arXiv:0806.0185.
- M.Hisakado and S.Mori, Phase Transition and Information Cascade in a Voting Model, preprint arXiv:0907.4818.
- More precisely, we calculate before removing the cancelled horses from data. Up to the cancellation, these do not sum up to one. After cancellation, the odds and the win bet fraction become zero. These sum up to one.
- If cancellation occurs in the betting process, the sum of is not before cancellation. In this case, it is necessary to divide by
- G.Pólya, Ann.Inst.Henri Poincaré 1,117(1931).
- M.Hisakado, K.Kitsukawa and S.Mori, J.Phys. A39,15365(2006).
- B.Enleman, E.Hayden and D.Tasche, Testing Rating Accuracy, www.risk.net (2003).