Extreme Learning Tree
The paper proposes a new variant of a decision tree, called an Extreme Learning Tree. It consists of an extremely random tree with non-linear data transformation, and a linear observer that provides predictions based on the leaf index where the data samples fall. The proposed method outperforms linear models on a benchmark dataset, and may be a building block for a future variant of Random Forest.
Randomized methods are a recent trend in practical machine learning . They enable the high performance of complex non-linear methods without the high computational cost of their optimization. Current most prominent examples are randomized neural networks, in both feed-forward  and recurrent  forms. For the latter, the randomized approach provided an efficient training method for the first time, and enabled achieving state-of-the-art performance in multiple areas .
Random forest  is one of the best methods for Big Data processing due to its adaptive nearest neighbour behavior . The forest predicts an output based only on local data samples. Such an approach works the better the more training data is available, thus making for a perfect supervised method for Big Data. K-nearest neighbors algorithm benefits from more data as the data itself is the model, but Random Forest avoids the quadratic scaling of k-Nearest neighbors in terms of the data samples, that makes it prohibitively slow for large-scale problems.
Decision tree  is a building block of Random Forest. A deep decision tree has high variance but low bias. An ensemble of multiple such trees reduces variance, and improves the prediction performance. Additional measures are taken to make the trees in an ensemble as different as possible, including random subsets of features and boosting .
The paper proposes a merge between random methods and a decision tree, called an Extreme Learning Tree (ELT). The method builds a tree using expanded data features from an Extreme Learning Machine , by splitting nodes on a random feature at a random point. The result is an Extremely Randomized Tree . Then a linear observer is added to the leaves of the tree, that learns a linear projection from the leaves to the target outputs. Each tree leaf is represented by its index, in the one-hot encoding format.
Extreme Learning Tree consists of three parts. First, it generates random data features using an Extreme Learning Machine (ELM) . Second, it builds a random tree from these features, similar to Extremely Randomized Trees . Each data sample is then represented by the index of its leaf from the tree, in one-hot encoding. Third, a linear regression is learned from the dataset in that one-hot encoding to the target outputs.
ELT follows the random methods paradigm as it has an untrained random part (the tree), and a learned linear observer (a linear regression model from leaves of the tree to the target outputs).
An ELT tree has two hyper parameters: the minimum node size, and the maximum thee depth. A node data is split by a random feature using a random split point. Split points that generates nodes under the minimum size are rejected. Nodes that reach the maximum depth or under twice the minimum size become leafs. Node splitting continues until there are non-leaf terminal nodes.
3 Experimental results
The Extreme Learning Tree is tested on well-known Iris flower dataset , in comparison with a Decision Tree, an L2 regularized ELM , and Ridge regression. Decision Tree implementation is from the Scikit-Learn library
The random tree in the ELT method splits data samples into groups of similar ones. The resulting structure in the original data space is shown on Figure 1. The tree works as a adaptive nearest neighbour, combining together similar samples. Then the target variable information from these samples is used by a linear observer to make predictions.
A formal performance comparison is done on Iris dataset. The data is randomly split into 70% training and 30% test sets, and the test accuracy is calculated for all the methods. The whole experiment is repeated 100 times. Mean accuracy and its standard deviation are presented in Table 1.
|Method||Accuracy std, %|
|Extreme Learning Tree|
In this experiment, an Extreme Learning Tree performs under ELM and Decision Tree methods. However, it outperforms a linear model (in the form of Ridge regression) by a significant margin. Outperforming a linear model is an achievement for a single ELT, as it represents each data sample by a single number – an index of its leaf in the tree.
Decision surface of ELT is visualized on Figure 3. The boundaries between classes have complex shape, but the classes are unbroken. Class boundaries of the original Decision Tree (shown on Figure 3) break into each other creating false predictions. They are always parallel to an axis, while ELT learns class boundaries of an arbitrary shape.
The paper proposes a new version of decision tree, that follows the random methods paradigm. It consists of an untrained random non-linear tree, and a learned linear observer. The method provides decision boundaries of a complex shape and with less noise than an original decision tree. It outperforms a purely linear model in accuracy despite representing the data samples only by a corresponding tree leaf index.
Future works will examine an application of Extreme Learning Tree to an ensemble method similar to Random Forest.
- (1984) Classification and regression trees. CRC press. External Links: Cited by: §1.
- (2001) Random Forests. Machine Learning 45 (1), pp. 5–32. External Links: Cited by: §1.
- (1936) THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS. Annals of Eugenics 7 (2), pp. 179–188. External Links: Cited by: §3.
- (26-28 April 2017) Randomized machine learning approaches: Recent developments and challenges. In ESANN 2017 Proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, pp. 77–86. Cited by: §1.
- (2006) Extremely randomized trees. Machine Learning 63 (1), pp. 3–42. External Links: Cited by: §1, §2.
- (2012-04) Extreme learning machine for regression and multiclass classification.. Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on 42 (2), pp. 513–529. External Links: Cited by: §1.
- (2006-12) Extreme learning machine: Theory and applications. Neural Networks Selected Papers from the 7th Brazilian Symposium on Neural Networks (SBRN ’04)7th Brazilian Symposium on Neural Networks 70 (1–3), pp. 489–501. External Links: Cited by: §2.
- (2015) What are Extreme Learning Machines? Filling the Gap Between Frank Rosenblatt’s Dream and John von Neumann’s Puzzle. Cognitive Computation 7 (3), pp. 263–278. External Links: Cited by: §1.
- (2004-04) Harnessing Nonlinearity: Predicting Chaotic Systems and Saving Energy in Wireless Communication. Science 304 (5667), pp. 78. External Links: Cited by: §1.
- (2006-06) Random Forests and Adaptive Nearest Neighbors. Journal of the American Statistical Association 101 (474), pp. 578–590. External Links: Cited by: §1.
- (2009-08) Reservoir computing approaches to recurrent neural network training. Computer Science Review 3 (3), pp. 127–149. External Links: Cited by: §1.
- (2011-09) TROP-ELM: A double-regularized ELM using LARS and Tikhonov regularization. Advances in Extreme Learning Machine: Theory and Applications Biological Inspired Systems. Computational and Ambient Intelligence Selected papers of the 10th International Work-Conference on Artificial Neural Networks (IWANN2009) 74 (16), pp. 2413–2421. External Links: Cited by: §3.
- (1998-08) The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence 20 (8), pp. 832–844. External Links: Cited by: §1.