The Artificial Regression Market
Abstract
The Artificial Prediction Market is a recent machine learning technique for multi-class classification, inspired from the financial markets. It involves a number of trained market participants that bet on the possible outcomes and are rewarded if they predict correctly. This paper generalizes the scope of the Artificial Prediction Markets to regression, where there are uncountably many possible outcomes and the error is usually the MSE. For that, we introduce the reward kernel that rewards each participant based on its prediction error and we derive the price equations. Using two reward kernels we obtain two different learning rules, one of which is approximated using Hermite-Gauss quadrature. The market setting makes it easy to aggregate specialized regressors that only predict when an observation falls into their specialization domain. Experiments show that regression markets based on the two learning rules outperform Random Forest Regression on many UCI datasets and are rarely outperformed.
1 Introduction
Prediction markets are forums of trade where contracts on the outcomes of future events are bought and sold. Each contract is a wager that yields payment if its corresponding outcome occurs. Each market participant has an incentive to profit and therefore an incentive to predict accurately. The trading prices of contracts are determined by supply and demand. Highly demanded contracts are more expensive and represent an overall confidence that a corresponding outcome will be realized. On the other hand, less demanded contracts are less expensive and represent an overall lack of confidence that a corresponding outcome will be realized. These trading prices can be interpreted as the market’s prediction of the outcome. Studies have shown that the trading prices even estimate the true probability of the outcome Manski (2006). Prediction markets have found use in predicting elections, decision making in both government and business realms, and even sporting events Arrow et al. (2008). Their reported accuracy and success motivated the development of the Artificial Prediction Market Lay (2009); Lay & Barbu (2010); Barbu & Lay (2011) that attempts to mimic a real prediction market in a machine learning setting. The Artificial Prediction Market has empirically proven to be a competitive classifier aggregation technique and motivates further investigation. It was proved in Barbu & Lay (2011) that the Artificial Prediction Market learns by constrained Maximum Likelihood.
In this paper we generalize the Artificial Prediction Market to regression. While the objective of classification is to predict a label from a finite set of labels, the objective of regression is to predict a real value response. We develop a mathematical analog of the Artificial Prediction Market, the Regression Market, to deal with real values, or uncountably many “labels”. Regression markets are unusual in that contracts are no longer discrete and finite. Each contract corresponds to a real value prediction and consequently there are uncountably many such contracts for trade. While in classification a contract that has not predicted the correct outcome does not win anything, for regression we introduce the reward kernel that rewards contracts based on the distance to the ground truth value.
We further show experiments on UCI Frank & Asuncion (2010) and LIAAD Torgo (2010) data sets that demonstrate that the Regression Market is a viable technique for aggregating regressors, and also works very well with specialized regressors that only predict outcomes for certain instances and not for other.
2 Related Work
To the best of our knowledge, there has been no other work on solving regression tasks with machine learning models of prediction markets. Related work can be found for classification in Lay & Barbu (2010); Barbu & Lay (2011) where Artificial Prediction Markets were developed for classification using betting functions and an equilibrium based on conservation of budget sum.
Another model can be found in Storkey (2011) where machine learning markets are instead derived from utility functions.
In Chen & Vaughan (2010) the authors find a connection between no-regret learning and prediction markets.
3 Overview of the Artificial Prediction Markets
In Lay & Barbu (2010), the classification market is defined by a betting function that describes the proportion of the budget to allot for label for a given instance and trading prices for all labels . The equilibrium price is defined such that the for any label, the sum of profits equaled the sum of losses
This equilibrium system corresponds to the update rule for the classification market
for . This is the profit. With a little reworking, the above equilibrium is equivalent to solving the following fixed point problem
The trading price is considered to be an estimate of the conditional mass. In fact, Barbu & Lay (2011) demonstrates that the classification market maximizes log likelihood.
4 Regression Markets
The extension of prediction markets to the regression problem proves to be counterintuitive. In classification, the goal is to predict the one correct label for a given instance. What can be said about regression? Assume, for the time being that the classification market framework generalizes. For the sake of consistency with probability notation will denote a betting functional that allots a proportion of the budget for response . This implies that
(1) |
since no participant may bet more than the whole of their budget in this market. A curious consequence of this constraint is that it is possible for for some . Likewise, the trading prices for are denoted as the price function . The trading price is a conditional density on the possible responses . The prediction can be computed from, for example, expectation
(2) |
However, the price function can also model ambiguous responses. For example, points along a circle could result in a bimodal price function.
The equilibrium price function receives similar treatment as the classification market. The objective is to find a that gives conservation of budget. The ambiguity of the correct label mentioned above is resolved by introducing a reward kernel . The reward kernel is a density with a single mode centered about the ground truth . The winnings are subsequently defined as
(3) |
and bears similarity to the winnings in the classification market. This has the effect of partially rewarding participants for nearby predictions. Likewise, the total expenditures for contracts are given as
(4) |
Analogous to the classification market, the equilibrium price function is defined such that gains match losses
(5) |
4.1 Constant Market for Regression
For simplicity and the reported empirical performance of the constant classification market, the remainder of this paper assumes where is a conditional density with mean . Here is a regressor. This defines the constant market for regression with
(6) | ||||
(7) |
The update rule is similar to that of the classification market in exception to the additional reward kernel
(8) |
where is the learning rate and also serves to prevent instanaeous bankruptcy (i.e. ). The choice of gives different update rules. We examine where is the Dirac delta function and
4.2 Delta Updates
When this gives an analogous update rule as the classification market
(9) |
Even though this reward kernel is exacting, it will be shown empirically to work relatively well.
4.3 Gaussian Updates
When , this gives an update involving an integral
(10) |
One way to approximate this integral is with Hermite-Gauss quadrature Press (2007). A change of variables is required to apply the quadrature rule
(11) | |||
(12) | |||
(13) |
where are the -point Hermite-Gauss weights and nodal points.
Intuitively, the choice of should reflect the noise variance of the training data (assuming Gaussian noise). If is too small, the market is more prone to overfitting. This can be chosen with cross validation by discretizing and trying (assuming the noise has mean ).
4.4 Specialized Regression Markets
Introduced in Lay & Barbu (2010), specialized markets are markets with participants which have local support in the feature space. This type of participant is assumed to perform relatively well in its domain. An example of a specialized market is a market with random tree leaves as participants. These types of markets have been demonstrated to be competitive with random forest. The specialized regression market of tree leaves is similar except that leaves are Gaussian instead of histograms. Each regression tree stores the sample mean and variance of instances that fall in each leaf.
5 Results
Data | RFB | RF | DM | GM | ||||
---|---|---|---|---|---|---|---|---|
abalone | 4177 | – | 8 | 4.600 | 4.571 | 4.571 | 4.571 | |
friedman1 | 200 | 2000 | 10 | 5.700 | 4.343+ | 4.335•+ | 4.193•+ | |
friedman2 | 200 | 2000 | 4 | 19600.0 | 19431.852 | 19232.482• | 18369.546•+ | |
friedman3 | 200 | 2000 | 4 | 0.022 | 0.028– | 0.028•– | 0.026•– | |
housing | 506 | – | 13 | 10.200 | 10.471 | 10.130• | 10.128• | |
ozone | 330 | – | 8 | 16.300 | 16.916 | 16.925 | 16.917 | |
servo | 167 | – | 4 | 0.246 | 0.336 | 0.295 | 0.322 | |
ailerons | 7154 | 6596 | 40 | – | 2.814e-008 | 2.814e-008• | 2.814e-008• | |
auto-mpg | 392 | – | 7 | – | 6.469 | 6.444 | 6.405• | |
auto-price | 159 | – | 15 | – | 3823550.43 | 3723413.430 | 3815863.98 | |
bank | 4500 | 3693 | 32 | – | 7.238e-003 | 7.212e-003• | 7.210e-003• | |
breast cancer | 194 | – | 32 | – | 1112.270 | 1112.509 | 1108.325 | |
cartexample | 40768 | – | 10 | – | 1.233 | 1.233† | 1.232• | |
computeractivity | 8192 | – | 21 | – | 5.414 | 5.398• | 5.414† | |
diabetes | 43 | – | 2 | – | 0.415 | 0.426† | 0.415 | |
elevators | 8752 | 7847 | 18 | – | 9.319e-006 | 9.288e-006• | 9.225e-006• | |
forestfires | 517 | – | 12 | – | 5834.819 | 5844.493† | 5680.131• | |
kinematics | 8192 | – | 8 | – | 0.013 | 0.013• | 0.013• | |
machine | 209 | – | 6 | – | 3154.521 | 2991.798• | 3042.336 | |
poletelecomm | 5000 | 10000 | 48 | – | 29.813 | 28.855• | 29.863† | |
pumadyn | 4499 | 3693 | 32 | – | 9.237e-005 | 8.917e-005• | 8.888e-005• | |
pyrimidines | 74 | – | 27 | – | 0.013 | 0.013 | 0.012 | |
triazines | 186 | – | 60 | – | 0.015 | 0.015 | 0.015 |
We performed two types of experiments with both updates (9), (10) and compared with Breiman’s original regression results Breiman (2001) as well as additional data sets from UCI and LIAAD Torgo (2010). To be consistent with Breiman, nearly all experiments were conducted over 100 random splits where each split randomly sets aside 10% of the data set for testing. For abalone, only 10 random splits with 25% of the data set aside for testing were considered. Data sets with provided test sets were not randomly split. Instead, the forest and markets were trained 100 times on the entire training set and tested on the provided test set. These results vary due to the randomness of the regression forest.
All experiments were run on Windows 7 with 8GB of RAM and Core i7-2630QM process (max 2.9GHz, 6MB L3 cache). On each training set 100 regression trees were trained. Each regression tree node considered 25 randomized features, each a linear combination of 2 random inputs. Each coefficient of the linear combination was uniformly picked from . In our implementation, 1000 of these random features were generated in advance rather than at each node. The split criteria for each node is based on the weighted sample variance. The rule “don’t split if the sample size is ” was enforced. Additionally, our implementation treats categoricals as numeric inputs which differs from Breiman’s implementation. However, most data sets are comprised of numeric inputs.
Both market types were trained and evaluated over 50 epochs. Each epoch is one complete pass through the training set. The reported errors are those that minimize the MSE of the test set over the 50 epochs (averaged over the 100 runs).
(14) |
The learning rate was used as in Barbu & Lay (2011). On the first run (random split or full training set), the parameter for the Gaussian Market reward kernel was estimated using 2-fold cross validation on the training set. This remained constant for the other runs ( runs for abalone). The Gaussian market used 5-point Hermite Gauss quadrature. The prediction for was computed with expectation
(15) |
In every result, significance is measured with significance level in two ways: pairwise t-test Demšar (2006) and t-test on the means. The pairwise t-test was used to compare the 100 market runs with the 100 forest runs while the t-test on the means were compared with Breiman’s reported results.
5.1 Comparison with Random Forest Regression
The first experiment considers aggregation of tree leaves of forests with fully grown trees on UCI and LIAAD data sets. The results of seven of the data sets are compared with Breiman’s reported results. The missing data set Robot Arm is private.
From 1 our RF doesn’t perform identically with RFB. This can be attributed to the synthetic nature of some data sets such as friedman1, friedman2, and friedman3 and/or the fact that our implementation of regression forest does not treat categorical inputs the same way. Of the Breiman comparisons, only GM is legitimately significantly better than Breiman’s results for friedman2. Out of all the data sets, DM is significantly better than RF for 12 data sets (in a pairwise sense) while GM is only significantly better than RF for 11 data sets. However, DM is significantly worse than RF for 3 data sets while GM is only significantly worse on 2 data sets. The significantly worse results can be attributed to overfitting and/or poorly tuned reward kernel in the case of GM.
Data | RFB | RF | DM | GM | Speedup | ||||
---|---|---|---|---|---|---|---|---|---|
abalone | 4177 | – | 8 | 4.600 | 4.438 | 4.318•+ | 4.438 | 3.3 | |
friedman1 | 200 | 2000 | 10 | 5.700 | 5.076+ | 4.701•+ | 4.429•+ | 1.8 | |
friedman2 | 200 | 2000 | 4 | 19600.0 | 29343.562– | 23200.438•– | 21183.421•– | 1.9 | |
friedman3 | 200 | 2000 | 4 | 0.022 | 0.034– | 0.029•– | 0.028•– | 2.0 | |
housing | 506 | – | 13 | 10.200 | 12.869– | 12.056•– | 11.947•– | 2.2 | |
ozone | 330 | – | 8 | 16.300 | 16.976 | 16.964 | 16.932 | 2.1 | |
servo | 167 | – | 4 | 0.246 | 0.248 | 0.241 | 0.254 | 1.6 | |
auto-mpg | 392 | – | 7 | – | 8.248 | 7.817• | 7.750• | 2.1 | |
auto-price | 159 | – | 15 | – | 4699789.7 | 4524741.81 | 4431992.3 | 1.4 | |
breast cancer | 194 | – | 32 | – | 1073.319 | 1071.820 | 1072.126 | 2.1 | |
diabetes | 43 | – | 2 | – | 0.400 | 0.426† | 0.393 | 0.7 | |
forestfires | 517 | – | 12 | – | 4945.630 | 5445.001† | 5196.451† | 2.2 | |
machine | 209 | – | 6 | – | 3137.001 | 3127.932 | 2930.506 | 1.8 | |
triazines | 186 | – | 60 | – | 0.016 | 0.015• | 0.015• | 2.0 |
5.2 Fast Regression using Shallow Trees
This experiment examined the aggregation capabilities of the regression market with shallow trees. In many problems, it is prohibitively expensive to train and even evaluate deep trees. In practice this is mitigated by enforcing a maximum tree depth. For example in Criminisi et al. (2011) and R Girshick & Criminisi (2011) the regression trees were constrained to depth 7. However, this strict constraint on tree depth is prone to introduce leaves that do not generalize well due to prematurely halting tree growth. The specialized regression market of tree leaves can be used to weight the leaves. Poorly performing leaves will tend to have less weight thus improving the overall prediction accuracy.
In addition to the previously mentioned experiment details, regression trees were grown with a maximum depth of 10. Using the same depth 10 trees, MSE errors were computed for leaves no deeper than depth 5. Both depth 5 and depth 10 evaluations for training and test sets were recorded. The timings for the larger of the two sets were averaged over the 100 runs and used to compute the speedup. The markets were applied to the depth 5 leaves only. Since the market is just a linear aggregation of 100 leaves per instance, the reported speedup for forest is similar to the speedup of the market.
From 2 it can be seen that the depth 5 forest is roughly twice the speed of the depth 10 forest. On diabetes, the small data set, features and forest likely fit in cache giving the strange 0.7 speedup. DM performs significantly better than RF on seven data sets (in a pairwise set) while DM only performs significantly better on six data sets. However, DM performs significantly worse on two data sets while GM performs significantly worse on one. No method legitimately performs significantly better than RFB since RF is already better than RFB on those two data sets. The significantly worse results can be attributed to overfitting and/or poorly tuned reward kernel in the case of GM.
6 Conclusion
This work presented a generalization of the Artificial Prediction Markets from classification to regression with uncountably many outcomes. It introduced two types of update rules and demonstrated their learning ability through experiments on UCI and LIAAD datasets. Furthermore, it showed the capability of the regression market to aggregate shallow tree leaves into much better regressors than those obtained by voting. In future work we plan to use the market for regression with non-uniform noise levels and multi-modal conditional probabilities .
References
- Arrow et al. (2008) Arrow, K. J., Forsythe, R., Gorham, M., Hahn, R., Hanson, R., Ledyard, J. O., Levmore, S., Litan, R., Milgrom, P., and Nelson, F. D. The promise of prediction markets. Science, 320(5878):877, 2008.
- Barbu & Lay (2011) Barbu, A. and Lay, N. An introduction to artificial prediction markets. Arxiv preprint arXiv:1102.1465, 2011.
- Breiman (2001) Breiman, L. Random forests. Machine Learning, 45(1):5–32, 2001.
- Chen & Vaughan (2010) Chen, Yileng and Vaughan, Jennifer Wortman. A new understanding of prediction markets via no-regret learning. In In the Eleventh ACM Conference on Electronic Commerce (EC 2010), 2010.
- Criminisi et al. (2011) Criminisi, Antonio, Shotton, Jamie, Robertson, Duncan, Konukoglu, and Ender. Regression forests for efficient anatomy detection and localization in ct studies. In Menze, Bjoern, Langs, Georg, Tu, Zhuowen, and Criminisi, Antonio (eds.), Medical Computer Vision. Recognition Techniques and Applications in Medical Imaging, volume 6533 of Lecture Notes in Computer Science, pp. 106–117. Springer Berlin / Heidelberg, 2011. ISBN 978-3-642-18420-8.
- Demšar (2006) Demšar, J. Statistical comparisons of classifiers over multiple data sets. The Journal of Machine Learning Research, 7:30, 2006.
- Frank & Asuncion (2010) Frank, A. and Asuncion, A. UCI machine learning repository, 2010. URL http://archive.ics.uci.edu/ml.
- Lay & Barbu (2010) Lay, N. and Barbu, A. Supervised Aggregation of Classifiers using Artificial Prediction Markets. In ICML, 2010.
- Lay (2009) Lay, Nathan. Supervised aggregation of classifers using artificial prediction markets. Master’s thesis, The Florida State University, Tallahassee, Florida, November 2009.
- Manski (2006) Manski, C.F. Interpreting the predictions of prediction markets. Economics Letters, 91(3):425–429, 2006.
- Press (2007) Press, W.H. Numerical recipes: the art of scientific computing. Cambridge University Press, 2007. ISBN 9780521880688. URL http://books.google.com/books?id=1aAOdzK3FegC.
- R Girshick & Criminisi (2011) R Girshick, J Shotton, P Kohli and Criminisi, A. Efficient Regression of General-Activity Human Poses from Depth Images. In Proceedings of the 13th International Conference on Computer Vision, 2011.
- Storkey (2011) Storkey, Amos. Machine learning markets. arXiv, 2011.
- Torgo (2010) Torgo, Luis. Regression data sets, 2010. URL http://www.liaad.up.pt/~ltorgo/Regression/DataSets.html.