Extreme Learning Tree

Extreme Learning Tree

Abstract

The paper proposes a new variant of a decision tree, called an Extreme Learning Tree. It consists of an extremely random tree with non-linear data transformation, and a linear observer that provides predictions based on the leaf index where the data samples fall. The proposed method outperforms linear models on a benchmark dataset, and may be a building block for a future variant of Random Forest.

1 Introduction

Randomized methods are a recent trend in practical machine learning [4]. They enable the high performance of complex non-linear methods without the high computational cost of their optimization. Current most prominent examples are randomized neural networks, in both feed-forward [8] and recurrent [11] forms. For the latter, the randomized approach provided an efficient training method for the first time, and enabled achieving state-of-the-art performance in multiple areas [9].

Random forest [13] is one of the best methods for Big Data processing due to its adaptive nearest neighbour behavior [10]. The forest predicts an output based only on local data samples. Such an approach works the better the more training data is available, thus making for a perfect supervised method for Big Data. K-nearest neighbors algorithm benefits from more data as the data itself is the model, but Random Forest avoids the quadratic scaling of k-Nearest neighbors in terms of the data samples, that makes it prohibitively slow for large-scale problems.

Decision tree [1] is a building block of Random Forest. A deep decision tree has high variance but low bias. An ensemble of multiple such trees reduces variance, and improves the prediction performance. Additional measures are taken to make the trees in an ensemble as different as possible, including random subsets of features and boosting [2].

The paper proposes a merge between random methods and a decision tree, called an Extreme Learning Tree (ELT). The method builds a tree using expanded data features from an Extreme Learning Machine [6], by splitting nodes on a random feature at a random point. The result is an Extremely Randomized Tree [5]. Then a linear observer is added to the leaves of the tree, that learns a linear projection from the leaves to the target outputs. Each tree leaf is represented by its index, in the one-hot encoding format.

2 Methodology

Extreme Learning Tree consists of three parts. First, it generates random data features using an Extreme Learning Machine (ELM) [7]. Second, it builds a random tree from these features, similar to Extremely Randomized Trees [5]. Each data sample is then represented by the index of its leaf from the tree, in one-hot encoding. Third, a linear regression is learned from the dataset in that one-hot encoding to the target outputs.

ELT follows the random methods paradigm as it has an untrained random part (the tree), and a learned linear observer (a linear regression model from leaves of the tree to the target outputs).

An ELT tree has two hyper parameters: the minimum node size, and the maximum thee depth. A node data is split by a random feature using a random split point. Split points that generates nodes under the minimum size are rejected. Nodes that reach the maximum depth or under twice the minimum size become leafs. Node splitting continues until there are non-leaf terminal nodes.

3 Experimental results

The Extreme Learning Tree is tested on well-known Iris flower dataset [3], in comparison with a Decision Tree, an L2 regularized ELM [12], and Ridge regression. Decision Tree implementation is from the Scikit-Learn library1.

The random tree in the ELT method splits data samples into groups of similar ones. The resulting structure in the original data space is shown on Figure 1. The tree works as a adaptive nearest neighbour, combining together similar samples. Then the target variable information from these samples is used by a linear observer to make predictions.

Figure 1: Leaf structure of an ELT, each color represents a different leaf. The random tree works as an approximated nearest neighbour method, joining together similar data samples.

A formal performance comparison is done on Iris dataset. The data is randomly split into 70% training and 30% test sets, and the test accuracy is calculated for all the methods. The whole experiment is repeated 100 times. Mean accuracy and its standard deviation are presented in Table 1.

Method Accuracy std, %
Ridge regression
Extreme Learning Tree
ELM
Decision Tree
Table 1: Average accuracy and its standard deviation on a test subset of Iris dataset.

In this experiment, an Extreme Learning Tree performs under ELM and Decision Tree methods. However, it outperforms a linear model (in the form of Ridge regression) by a significant margin. Outperforming a linear model is an achievement for a single ELT, as it represents each data sample by a single number – an index of its leaf in the tree.

Decision surface of ELT is visualized on Figure 3. The boundaries between classes have complex shape, but the classes are unbroken. Class boundaries of the original Decision Tree (shown on Figure 3) break into each other creating false predictions. They are always parallel to an axis, while ELT learns class boundaries of an arbitrary shape.

Figure 2: Decision surface of an ELT on Iris dataset, using different pairs of features. Different colors correspond to the three different classes of Iris flowers.
Figure 3: Decision surface of a Decision Tree on Iris dataset, using different pairs of features. Note that all decision boundaries are parallel to axes.
Figure 2: Decision surface of an ELT on Iris dataset, using different pairs of features. Different colors correspond to the three different classes of Iris flowers.

4 Conclusions

The paper proposes a new version of decision tree, that follows the random methods paradigm. It consists of an untrained random non-linear tree, and a learned linear observer. The method provides decision boundaries of a complex shape and with less noise than an original decision tree. It outperforms a purely linear model in accuracy despite representing the data samples only by a corresponding tree leaf index.

Future works will examine an application of Extreme Learning Tree to an ensemble method similar to Random Forest.

Footnotes

  1. http://scikit-learn.org/stable/auto_examples/tree/plot_iris.html

References

  1. L. Breiman, J. Friedman, C. J. Stone and R. A. Olshen (1984) Classification and regression trees. CRC press. External Links: ISBN 0-412-04841-8 Cited by: §1.
  2. L. Breiman (2001) Random Forests. Machine Learning 45 (1), pp. 5–32. External Links: ISSN 1573-0565, Document Cited by: §1.
  3. R. A. FISHER (1936) THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS. Annals of Eugenics 7 (2), pp. 179–188. External Links: ISSN 2050-1439, Document Cited by: §3.
  4. C. Gallicchio, J. D. Martin-Guerrero, A. Micheli and E. Soria-Olivas (26-28 April 2017) Randomized machine learning approaches: Recent developments and challenges. In ESANN 2017 Proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, pp. 77–86. Cited by: §1.
  5. P. Geurts, D. Ernst and L. Wehenkel (2006) Extremely randomized trees. Machine Learning 63 (1), pp. 3–42. External Links: ISSN 1573-0565, Document Cited by: §1, §2.
  6. G. Huang, H. Zhou, X. Ding and R. Zhang (2012-04) Extreme learning machine for regression and multiclass classification.. Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on 42 (2), pp. 513–529. External Links: ISSN 1941-0492, Document Cited by: §1.
  7. G. Huang, Q. Zhu and C. Siew (2006-12) Extreme learning machine: Theory and applications. Neural Networks Selected Papers from the 7th Brazilian Symposium on Neural Networks (SBRN ’04)7th Brazilian Symposium on Neural Networks 70 (1–3), pp. 489–501. External Links: ISSN 0925-2312, Document Cited by: §2.
  8. G. Huang (2015) What are Extreme Learning Machines? Filling the Gap Between Frank Rosenblatt’s Dream and John von Neumann’s Puzzle. Cognitive Computation 7 (3), pp. 263–278. External Links: ISSN 1866-9964, Document Cited by: §1.
  9. H. Jaeger and H. Haas (2004-04) Harnessing Nonlinearity: Predicting Chaotic Systems and Saving Energy in Wireless Communication. Science 304 (5667), pp. 78. External Links: Document Cited by: §1.
  10. Y. Lin and Y. Jeon (2006-06) Random Forests and Adaptive Nearest Neighbors. Journal of the American Statistical Association 101 (474), pp. 578–590. External Links: ISSN 0162-1459, Document Cited by: §1.
  11. M. Lukoševičius and H. Jaeger (2009-08) Reservoir computing approaches to recurrent neural network training. Computer Science Review 3 (3), pp. 127–149. External Links: ISSN 1574-0137, Document Cited by: §1.
  12. Y. Miche, M. van Heeswijk, P. Bas, O. Simula and A. Lendasse (2011-09) TROP-ELM: A double-regularized ELM using LARS and Tikhonov regularization. Advances in Extreme Learning Machine: Theory and Applications Biological Inspired Systems. Computational and Ambient Intelligence Selected papers of the 10th International Work-Conference on Artificial Neural Networks (IWANN2009) 74 (16), pp. 2413–2421. External Links: ISSN 0925-2312, Document Cited by: §3.
  13. Tin Kam Ho (1998-08) The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence 20 (8), pp. 832–844. External Links: ISSN 0162-8828, Document Cited by: §1.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
402627
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description