Abstract
Although adversarial examples and model robustness have been extensively studied in the context of linear models and neural networks, research on this issue in treebased models and how to make treebased models robust against adversarial examples is still limited. In this paper, we show that tree based models are also vulnerable to adversarial examples and develop a novel algorithm to learn robust trees. At its core, our method aims to optimize the performance under the worstcase perturbation of input features, which leads to a maxmin saddle point problem. Incorporating this saddle point objective into the decision tree building procedure is nontrivial due to the discrete nature of trees—a naive approach to finding the best split according to this saddle point objective will take exponential time. To make our approach practical and scalable, we propose efficient tree building algorithms by approximating the inner minimizer in this saddle point problem, and present efficient implementations for classical information gain based trees as well as stateoftheart tree boosting models such as XGBoost. Experimental results on real world datasets demonstrate that the proposed algorithms can substantially improve the robustness of treebased models against adversarial examples.
oddsidemargin has been altered.
marginparsep has been altered.
topmargin has been altered.
marginparwidth has been altered.
marginparpush has been altered.
paperheight has been altered.
The page layout violates the ICML style.
Please do not change the page layout, or include packages like geometry,
savetrees, or fullpage, which change it for you.
We’re not able to reliably undo arbitrary changes to the style. Please remove
the offending package(s), or layoutchanging commands and try again.
Robust Decision Trees Against Adversarial Examples
Hongge Chen ^{0 } Huan Zhang ^{0 } Duane Boning ^{0 } ChoJui Hsieh ^{0 }
Proceedings of the International Conference on Machine Learning, Long Beach, California, PMLR 97, 2019. Copyright 2019 by the author(s).\@xsect
The discovery of adversarial examples in various deep learning models (Szegedy et al., 2013; Kos et al., 2018; Cheng et al., 2018; Chen et al., 2018a; Carlini & Wagner, 2018; Huang et al., 2017) has led to extensive studies of deep neural network (DNN) robustness under such maliciously crafted subtle perturbations. Although deep learningbased model robustness has been wellstudied in the recent literature from both attack and defense perspectives, studies on the robustness of treebased models are quite limited (Papernot et al., 2016a).
In our paper, we shed light on the adversarial robustness of an important class of machine learning models — decision trees. Among machine learning models used in practice, treebased methods stand out in many applications, with stateoftheart performance. Treebased methods have achieved widespread success due to their simplicity, efficiency, interpretability, and scalability on large datasets. They have been suggested as an advantageous alternative to deep learning in some cases (Zhou & Feng, 2017). In this paper, we study the robustness of treebased models under adversarial attacks, and more importantly, we propose a novel robust training framework for treebased models. Below we highlight our major contributions:

We study the robustness of decision treebased machine learning algorithms through the lens of adversarial examples. We study both classical decision trees and stateoftheart ensemble boosting methods such as XGBoost. We show that, similar to neural networks, treebased models are also vulnerable to adversarial examples.

We propose a novel robust decision tree training framework to improve robustness against adversarial examples. This method seeks to optimize the worst case condition by solving a maxmin problem. This framework is quite general and can be applied to treebased models with any score function used to choose splitting thresholds. To the best of our knowledge, this is the first work contributing a general robust decision tree training framework against adversarial examples.

We implement our framework in both classical information gain based classification trees and stateoftheart largescale tree boosting systems. To scale up our framework, we make necessary and efficient approximations to handle complex models and real world data sets. Our experimental results show consistent and substantial improvements on adversarial robustness.
Decision tree learning methods are widely used in machine learning and data mining. As considered here, the goal is to create a tree structure with each interior node corresponding to one of the input features. Each interior node has two children, and edges to child nodes represent the split condition for that feature. Each leaf provides a prediction value of the model, given that the input features satisfy the conditions represented by the path from the root to that leaf. In practice, decision tree learning algorithms are based on greedy search, which builds a tree starting from its root by making locally optimal decisions at each node. Classical decision tree training recursively chooses features, sets thresholds and splits the examples on a node by maximizing a predefined score, such as information gain or Gini impurity.
Decision trees are often used within ensemble methods. A wellknown gradient tree boosting method has been developed by Friedman et al. (2000); Friedman (2001) and Friedman (2002) to allow optimization of an arbitrary differentiable loss function. Later scalable tree boosting systems have been built to handle large datasets. For example, pGBRT (Tyree et al., 2011) parallelizes the training procedure by data partitioning for faster and distributed training. XGBoost (Chen & Guestrin, 2016) is a prominent tree boosting software framework; in data mining contests, 17 out of 29 published winning solutions at Kaggle’s blog in 2015 used XGBoost in their models. LightGBM (Ke et al., 2017; Zhang et al., 2018) is another highly efficient boosting framework that utilizes histograms on data features to significantly speed up training. mGBDT (Feng et al., 2018) learns hierarchical representations by stacking multiple layers of gradient boosted decision trees (GBDTs). Other variants such as extreme multilabel GBDT (Si et al., 2017) and cost efficient tree boosting approaches (Peter et al., 2017; Xu et al., 2019) have also been proposed recently.
An adversarial attack is a subtle modification of a benign example. In a successful attack, the classifier will misclassify this modified example, while the original example is correctly classified. Such attacks can be roughly divided into two categories, whitebox attacks and blackbox attacks. Whitebox attacks assume that the model is fully exposed to the attacker, including parameters and structures, while in blackbox attacks, the attacker can query the model but has no (direct) access to any internal information inside the model. FGSM (Goodfellow et al., 2015) is one of the first methods in the whitebox attack category. It computes the gradient only once to generate an adversarial example. This method is strengthened as IterativeFGSM (or IFGSM) (Kurakin et al., 2017), which applies FGSM multiple times for a higher attack success rate and smaller distortion. C&W attack (Carlini & Wagner, 2017) formulates the attack as an optimization problem with an penalization. EADL1 attack (Chen et al., 2018b) uses a more general formulation than C&W attack with elasticnet regularization. To bypass some defenses with obfuscated gradients, the BPDA attack introduced by Athalye et al. (2018) is shown to successfully circumvent many defenses.
The whitebox setting is often argued as being unrealistic in the literature. In contrast, several recent works have studied ways to fool the model given only model output scores or probabilities. Methods in Chen et al. (2017) and Ilyas et al. (2017) are able to craft adversarial examples by making queries to obtain the corresponding probability outputs of the model. A stronger and more general attack has been developed recently by Cheng et al. (2019), which does not rely on the gradient nor the smoothness of model output. This enables attackers to successfully attack models that only output hard labels.
It is difficult to defend against adversarial examples, especially under strong and adaptive attacks. Some early methods, including feature squeezing (Xu et al., 2017) and defensive distillation (Papernot et al., 2016b) have been proven ineffective against stronger attacks like C&W. Many recently proposed defense methods are based on obfuscated gradients (Guo et al., 2017; Song et al., 2017; Buckman et al., 2018; Ma et al., 2018; Samangouei et al., 2018) and are already overcome by the aforementioned BPDA attack.
Adversarial training, first introduced in Kurakin et al. (2017), is effective on DNNs against various attacks. In adversarial training, adversarial examples are generated during the training process and are used as training data to increase model robustness. This technique has been formally posed as a minmax robust optimization problem in Madry et al. (2018) and has achieved very good performance under adversarial attacks. Several recent work have tried to improve over the original adversarial training formulation (Liu & Hsieh, 2019; Liu et al., 2019; Zhang et al., 2019). There are some other methods in the literature seeking to give provable guarantees on the robustness performance, such as distributional robust optimization (Sinha et al., 2018), convex relaxations (Wong & Kolter, 2018; Wong et al., 2018; Wang et al., 2018) and semidefinite relaxations (Raghunathan et al., 2018). Some of these methods can be deployed in mediumsized networks and achieve satisfactory robustness.
However, all of the current defense methods assume the model to be differentiable and use gradient based optimizers, so none of them can be directly applied to decision tree based models, which are discrete and nondifferentiable.
Recent developments in machine learning have resulted in the deployment of largescale tree boosting systems in critical applications such as fraud and malware detection. Unlike deep neural networks (DNNs), tree based models are nonsmooth, nondifferentiable and sometimes interpretable, which might lead to the belief that they are more robust than DNNs. However, the experiments in our paper show that similar to DNNs, treebased models can also be easily compromised by adversarial examples. In this paper, we focus on untargeted attacks, which are considered to be successful as long as the model misclassifies the adversarial examples.
Unlike DNNs, algorithms for crafting adversarial examples for treebased models are poorly studied. The main reason is that treebased models are discrete and nondifferentiable, thus we cannot use common gradient descent based methods for whitebox attack. An early attack algorithm designed for single decision trees has been proposed by Papernot et al. (2016a), based on greedy search. To find an adversarial example, this method searches the neighborhood of the leaf which produces the original prediction, and finds another leaf labeled as a different class by considering the path from the original leaf to the target leaf, and changing the feature values accordingly to result in misclassification.
A whitebox attack against binary classification tree ensembles has been proposed by Kantchelian et al. (2016). This method finds the exact smallest distortion (measured by some norm) necessary to mislead the model. However, the algorithm relies on Mixed Integer Linear Programming (MILP) and thus can be very timeconsuming when attacking large scale tree models as arise in XGBoost. In this paper, we use the version of Kantchelian’s attack as one of our methods to evaluate small and midsize binary classification model robustness. Kantchelian et al. (2016) also introduce a faster approximation to generate adversarial examples using symbolic prediction with norm minimization and combine this method into an adversarial training approach. Unfortunately, the demonstrated adversarial training is not very effective; despite increasing model robustness for norm perturbations, robustness for , and norm perturbations are noticeably reduced compared to the naturally (nonrobustly) trained model.
Original 









In our paper, in addition to Kantchelian attacks we also use a general attack method proposed in Cheng et al. (2019) which does not rely on the gradient nor the smoothness of output of a machine learning model. Cheng’s attack method has been used to efficiently evaluate the robustness of complex models on large datasets, even under blackbox settings. To deal with nonsmoothness of model output, this method focuses on the distance between the benign example and the decision boundary, and reformulates the adversarial attack as a minimization problem of this distance. Despite the nonsmoothness of model prediction, the distance to decision boundary is usually smooth within a local region, and can be found by binary search on vector length given a direction vector. To minimize this distance without gradient, Cheng et al. (2019) used a zeroth order optimization algorithm with a randomized gradientfree method. In our paper, we use the version of Cheng’s attack.
Some adversarial examples obtained by this method are shown in Figure 1, where we display results on both MNIST and FashionMNIST datasets. The models we test are natural GBDT models trained using XGBoost and our robust GBDT models, each with 200 trees and a tree depth of 8. Cheng’s attack is able to craft adversarial examples with very small distortions on natural models; for human eyes, the adversarial distortion added to the natural model’s adversarial examples appear as imperceptible noise. We also conduct whitebox attacks using the MILP formulation (Kantchelian et al., 2016), which takes much longer time to solve but the distortion found by MILP is comparable to Cheng’s method; see Section id1 for more details. In contrast, for our robust GBDT model, the required adversarial example distortions are so large that we can even vaguely see a number 8 in subfigure (c). The substantial increase in the distortion required to misclassify as well as the increased visual impact of such distortions shows the effectiveness of our robust decision tree training, which we will introduce in detail next. In the main text, we use the version of Kantchelian’s attack; we present results of and Kantchelian attacks in the appendix.
As shown in Section id1, treebased models are vulnerable to adversarial examples. Thus it is necessary to augment the classical natural tree training procedure in order to obtain reliable models robust against adversarial attacks. Our method formulates the process of optimally finding best split threshold in decision tree training as a robust optimization problem. As a conceptual illustration, Figure 2 presents a special case where the traditional greedy optimal splitting may yield nonrobust models. A horizontal split achieving high accuracy or score on original points may be easily compromised by adversarial perturbations. On the other hand, we are able to select a better vertical split considering possible perturbations in balls. At a high level, the robust splitting feature and threshold take the distances between data points into account (which is often ignored in most decision tree learning algorithms) and tries to optimize the worst case performance under adversarial perturbations. Some recent works in DNNs (Ilyas et al., 2019; Tsipras et al., 2019) divided features into two categories, robust features and nonrobust features. In treebased models, the effect of this dichotomy on the robustness is straight forward, as seen in the two different splits in Figure 2 using (a robust feature) and (a nonrobust feature).
In this section we formally introduce our robust decision tree training framework. For a training set with examples and real valued features (, , ), we first normalize the feature values to such that (the best feature value for split will also be scaled accordingly, but it is irrelevant to model performance). For a general decision tree based learning model, at a given node, we denote as the set of points at that node. For a split on the th feature with a threshold , the sets that will be mentioned in Sections id1, id1 and id1 are summarized in Table 1.


Notation  Definition 


set of examples on the current node  
(for classification)  
(for classification)  
}  

In classical tree based learning algorithms (which we refer to as “natural” trees in this paper), the quality of a split on a node can be gauged by a score function : a function of the splits on left and right child nodes ( and ), or equivalently on the chosen feature to split and a corresponding threshold value . Since and are determined by , and , we abuse the notation and define .
Traditionally, people consider different scores for choosing the “best” split, such as information gain used by ID3 (Quinlan, 1986) and C4.5 (Quinlan, 1986), or Gini impurity in CART (Breiman, 1984). Modern software packages (Chen & Guestrin, 2016; Ke et al., 2017; Dorogush et al., 2018) typically find the best split that minimize a loss function directly, allowing decision trees to be used in a large class of problems (i.e., mean square error loss for regression, logistic loss for classification, and ranking loss for ranking problems). A regular (“natural”) decision tree training process will either exactly or approximately evaluate the score function, for all possible features and split thresholds on the leaf to be split, and select the best pair:
(1) 
In our setting, we consider the case where features of examples in and can be perturbed by an adversary. Since a typical decision tree can only split on a single feature at one time, it is natural to consider adversarial perturbations within an ball of radius around each example :
Such perturbations enable the adversary to minimize the score obtained by our split. So instead of finding a split with highest score, an intuitive approach for robust training is to maximize the minimum score value obtained by all possible perturbations in an ball with radius ,
(2) 
where is a robust score function defined as
(3) 
In other words, each can be perturbed individually under an norm bounded perturbation to form a new set of training examples . We consider the worst case perturbation, such that the set triggers the worst case score after split with feature and threshold . The training objective (2) becomes a maxmin optimization problem.
Note that there is an intrinsic consistency between boundaries of the balls and the decision boundary of a decision tree. For the split on the th feature, perturbations along features other than do not affect the split. So we only need to consider perturbations within along the th feature. We define as the ambiguity set, containing examples with feature inside the region (see Table 1). Only examples in may be perturbed from to or from to to reduce the score. Perturbing points in will not change the score or the leaves they are assigned to. We denote and as the set of examples that are certainly on the left and right child leaves under perturbations (see Table 1 for definitions). Then we introduce 01 variables denoting an example in the ambiguity set to be assigned to and , respectively. Then the can be formulated as a 01 integer optimization problem with variables, which is NPhard in general. Additionally, we need to scan through all features of all examples and solve minimization problems for a single split at a single node. This large number of problems to solve makes this computation intractable. Therefore, we need to find an approximation for the . In Sections id1 and id1, we present two different approximations and corresponding implementations of our robust decision tree framework, first for classical decision trees with information gain score, and then for modern tree boosting systems which can minimize any loss function.
It is worth mentioning that we normalize features to for the sake of simplicity in this paper. One can also define for each feature and then the adversary is allowed to perturb within . In this case, we would not need to normalize the features. Also, is a hyperparameter in our robust model. Models trained with larger are expected to be more robust and when , the robust model is the same as a natural model.
Here we consider a decision tree for binary classification, , with information gain as the metric for node splitting. The information gain score is
where and are entropy and conditional entropy on the empirical distribution. For simplicity, we denote , , and . The following theorem shows adversary’s perturbation direction to minimize the information gain.
Theorem 1.
If and , perturbing one example in with label 0 to will decrease the information gain.
Similarly, if and , perturbing one example in with label 1 to will decrease the information gain. The proof of this theorem will be presented in Section id1 in the appendix. Note that we also have a similar conclusion for Gini impurity score, which will be shown in Section id1 in the appendix. Therefore, to decrease the information gain score, the adversary needs to perturb examples in such that and are close to each other (the ideal case may not be achieved because , , and are integers). The robust split finding algorithm is shown in Algorithm 1. In this algorithm we find a perturbation that minimizes as an approximation and upper bound to the optimal solution. Algorithm 3 in Section id1 in the appendix shows an procedure to find such perturbation to approximately minimize the information gain. Since the algorithm scans through in the sorted order, the sets , can be maintained in amortized time in the inner loop. Therefore, the computational complexity of the robust training algorithm is per split.
Although it is possible to extend our conclusion to other traditional scores of classification trees, we will focus on the modern scenario where we use a regression tree to fit any loss function in Section id1.
We now introduce the regression tree training process used in many modern tree boosting packages including XGBoost (Chen & Guestrin, 2016), LightGBM (Ke et al., 2017) and CatBoost (Dorogush et al., 2018). Specifically, we focus on the formulation of gradient boosted decision tree (GBDT), which is one of the most successful ensemble models and has been widely used in industry. GBDT is an additive tree ensemble model combining outputs of trees
where each is a decision tree and is the final output for . Here we only focus on regression trees where . Note that even for a classification problem, the modern treatment in GBDT is to consider the data with logistic loss, and use a regression tree to minimize this loss.
During GBDT training, the trees are generated in an additive manner: when we consider the tree , all previous trees are kept unchanged. For a general convex loss function (such as MSE or logistic loss), we desire to minimize the following objective
where is a regularization term to penalize complex trees; for example, in XGBoost, , where is the number of leaves, is a vector of all leaf predictions and are regularization constants. Importantly, when we consider , is a constant. The impact of on can be approximated using a second order Taylor expansion:
where and are the first and second order derivatives on the loss function with respect to the prediction of decision tree on point . Conceptually, ignoring the regularization terms, the score function can be given as:
where , and are the prediction values of the left, right and parent nodes. The score represents the improvements on reducing the loss function for all data examples in . The exact form of score used in XGBoost with regularization terms is given in (Chen & Guestrin, 2016):
where is a regularization constant. Again, to minimize the score by perturbing points in , the adversary needs to solve an intractable 01 integer optimization at each possible splitting position. Since GBDT is often deployed in large scale data mining tasks with a large amount of training data to scan through at each node, and we need to solve times, we cannot afford any expensive computation. For efficiency, our robust splitting procedure for boosted decision trees, as detailed in Algorithm 2, approximates the minimization by considering only four representative cases: (1) no perturbations: ; (2) perturb all points in to the right: ; (3) perturb all points in to the left: ; (4) swap the points in : . We take the minimum among the four representative cases as an approximation of the :
(4) 
Though this method only takes time to give a rough approximation of the at each possible split position, it is effective empirically as demonstrated next in Section id1.
Our code is at https://github.com/chenhongge/RobustTrees.
We present results on three small datasets with robust information gain based decision trees using Algorithm 1. We focus on untargeted adversarial attacks. For each dataset we test on 100 examples (or the whole test set), and we only attack correctly classified images. Attacks proceed until the attack success rate is 100%; the differences in robustness are reflected in the distortion of the adversarial examples required to achieve a successful attack. In Table 2, we present the average distortion of the adversarial examples of both classical natural decision trees and our robust decision trees trained on different datasets. We use Papernot’s attack as well as versions of Cheng’s and Kantchelian’s attacks. The and distortion found by Kantchelian’s and attacks are presented in Table 4 in the appendix. The adversarial examples found by Cheng’s, Papernot’s and Kantchelian’s attacks have much larger norm for our robust trees compared to those for the natural trees, demonstrating that our robust training algorithm improves the decision tree robustness substantially. In some cases our robust decision trees also have higher test accuracy than the natural trees. This may be due to the fact that the robust score tends to encourage the tree to split at thresholds where fewer examples are in the ambiguity set, and thus the split is also robust against random noise in the training set. Another possible reason is the implicit regularization in the robust splitting. The robust score is always lower than the regular score and thus our splitting is more conservative. Also, from results in Table 2 we see that most of the adversarial examples found by Papernot’s attack have larger norm than those found by Cheng’s attack. This suggests that the straightforward greedy search attack is not as good as a sophisticated general attack for attacking decision trees. Cheng’s attack is able to achieve similar distortion as Kantchelian’s attack, without solving expensive MILPs. While not scalable to large datasets, Kantchelian’s attack can find the minimum adversarial examples, reflecting the true robustness of a treebased model.


Dataset 
training  test  # of  # of  robust  depth  test acc. 




set size  set size  features  classes  robust  natural  robust  natural  robust  natural  robust  natural  robust  natural  


breastcancer 
546  137  10  2  0.3  5  5  .948  .942  .531  .189  .501  .368  .463  .173  
diabetes 
614  154  8  2  0.2  5  5  .688  .747  .206  .065  .397  .206  .203  .060  
ionosphere 
281  70  34  2  0.2  4  4  .986  .929  .388  .109  .408  .113  .358  .096  



In this subsection, we evaluate our algorithm in the tree boosting setting, where multiple robust decision trees are created in an ensemble to improve model accuracy. We implement Algorithm 2 by slightly modifying the node splitting procedure in XGBoost. Our modification is only relevant to computing the scores for selecting the best split, and is compatible with other existing features of XGBoost. We also use XGBoost to train natural (undefended) GBDT models. Again, we focus on untargeted adversarial attacks. We consider nine real world large or medium sized datasets and two small datasets (Chang & Lin, 2011), spanning a variety of data types (including both tabular and image data). For small datasets we use 100 examples and for large or medium sized datasets, we use 5000 examples for robustness evaluation, except for MNIST 2 vs. 6, where we use 100 examples. MNIST 2 vs. 6 is a subset of MNIST to only distinguish between 2 and 6. This is the dataset tested in Kantchelian et al. (2016). We use the same number of trees, depth and step size shrinkage as in Kantchelian et al. (2016) to train our robust and natural models. Same as Kantchelian et al. (2016), we only test 100 examples for MNIST 2 vs. 6 since the model is relatively large. In Table 3, we present the average distortion of adversarial examples found by Cheng’s attack for both natural GBDT and robust GBDT models trained on those datasets. For small and medium binary classification models, we also present results of Kantchelian’s attack, which finds the minimum adversarial example in norm. The and distortion found by Kantchelian’s and attacks are presented in Table 5 in the appendix. Kantchelian’s attack can only handle binary classification problems and small scale models due to its timeconsuming MILP formulation. Papernot’s attack is inapplicable here because it is for attacking a single tree only. The natural and robust models have the same number of trees for comparison. We only attack correctly classified images and all examples are successfully attacked. We see that our robust GBDT models consistently outperform the natural GBDT models in terms of robustness.
For some datasets, we need to increase tree depth in robust GBDT models in order to obtain accuracy comparable to the natural GBDT models. The requirement of larger model capacity is common in the adversarial training literature: in the stateoftheart defense for DNNs, Madry et al. (2018) argues that increasing the model capacity is essential for adversarial training to obtain good accuracy.


Dataset 
training  test  # of  # of  # of  robust  depth  test acc. 

dist. 

dist.  
set size  set size  features  classes  trees  robust  natural  robust  natural  robust  natural  improv.  robust  natural  improv.  


breastcancer 
546  137  10  2  4  0.3  8  6  .978  .964  .411  .215  1.91X  .406  .201  2.02X  
covtype  400,000  181,000  54  7  80  0.2  8  8  .847  .877  .081  .061  1.31X  not binary  not binary  —  
codrna  59,535  271,617  8  2  80  0.2  5  4  .880  .965  .062  .053  1.16X  .054  .034  1.59X  
diabetes  614  154  8  2  20  0.2  5  5  .786  .773  .139  .060  2.32X  .114  .047  2.42X  
FashionMNIST  60,000  10,000  784  10  200  0.1  8  8  .903  .903  .156  .049  3.18X  not binary  not binary  —  
HIGGS  10,500,000  500,000  28  2  300  0.05  8  8  .709  .760  .022  .014  1.57X  time out  time out  —  
ijcnn1  49,990  91,701  22  2  60  0.1  8  8  .959  .980  .054  .047  1.15X  .037  .031  1.19X  
MNIST  60,000  10,000  784  10  200  0.3  8  8  .980  .980  .373  .072  5.18X  not binary  not binary  —  
Sensorless  48,509  10,000  48  11  30  0.05  6  6  .987  .997  .035  .023  1.52X  not binary  not binary  —  
webspam  300,000  50,000  254  2  100  0.05  8  8  .983  .992  .049  .024  2.04X  time out  time out  —  
MNIST 2 vs. 6  11,876  1,990  784  2  1000  0.3  6  4  .997  .998  .406  .168  2.42X  .315  .064  4.92X  



Figure 3 and Figure 5 in the appendix show the distortion and accuracy of MNIST and FashionMNIST models with different number of trees. The adversarial examples are found by Cheng’s attack. Models with trees are the first trees during a single boosting run of () trees. The distortion of robust models are consistently much larger than those of the natural models. For MNIST dataset, our robust GBDT model loses accuracy slightly when the model has only 20 trees. This loss is gradually compensated as more trees are added to the model; regardless of the number of trees in the model, the robustness improvement is consistently observed, as our robust training is embedded in each tree’s building process and we create robust trees beginning from the very first step of boosting. Adversarial training in Kantchelian et al. (2016), in contrast, adds adversarial examples with respect to the current model at each boosting round so adversarial examples produced in the later stages of boosting are only learned by part of the model. The nonrobust trees in the first few rounds of boosting still exist in the final model and they may be the weakness of the ensemble. Similar problems are not present in DNN adversarial training since the whole model is exposed to new adversarial examples throughout the training process. This may explain why adversarial training in Kantchelian et al. (2016) failed to improve , , or robustness on the MNIST 2 vs. 6 model, while our method achieves significant robustness improvement with the same training parameters and evaluation metrics, as shown in Tables 3 and 5. Additionally, we also evaluate the robustness of natural and robust models with different number of trees on a variety of datasets using Cheng’s attack, presented in Table 7 in the appendix.
We also test our framework on random forest models and the results are shown in Section id1 in the appendix.
In this paper, we study the robustness of treebased machine learning models under adversarial attacks. Our experiments show that just as in DNNs, treebased models are also vulnerable to adversarial attacks. To address this issue, we propose a novel robust decision tree training framework. We make necessary approximations to ensure scalability and implement our framework in both classical decision tree and tree boosting settings. Extensive experiments on a variety of datasets show that our method substantially improves model robustness. Our framework can be extended to other treebased models such as Gini impurity based classification trees, random forest, and CART.
Acknowledgements The authors thank Aleksander Mądry for fruitful discussions. The authors also acknowledge the support of NSF via IIS1719097, Intel, Google Cloud, Nvidia, SenseTime and IBM.
References
 Athalye et al. (2018) Athalye, A., Carlini, N., and Wagner, D. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In International Conference on Machine Learning, 2018.
 Breiman (1984) Breiman, L. Classification and Regression Trees. Routledge, 1984.
 Buckman et al. (2018) Buckman, J., Roy, A., Raffel, C., and Goodfellow, I. Thermometer encoding: One hot way to resist adversarial examples. International Conference on Learning Representations, 2018.
 Carlini & Wagner (2017) Carlini, N. and Wagner, D. Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy (SP), pp. 39–57. IEEE, 2017.
 Carlini & Wagner (2018) Carlini, N. and Wagner, D. Audio adversarial examples: Targeted attacks on speechtotext. arXiv preprint arXiv:1801.01944, 2018.
 Chang & Lin (2011) Chang, C.C. and Lin, C.J. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1–27:27, 2011. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
 Chen et al. (2018a) Chen, H., Zhang, H., Chen, P.Y., Yi, J., and Hsieh, C.J. Attacking visual language grounding with adversarial examples: A case study on neural image captioning. In Annual Meeting of the Association for Computational Linguistics, 2018a.
 Chen et al. (2017) Chen, P.Y., Zhang, H., Sharma, Y., Yi, J., and Hsieh, C.J. Zoo: Zeroth order optimization based blackbox attacks to deep neural networks without training substitute models. In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pp. 15–26. ACM, 2017.
 Chen et al. (2018b) Chen, P.Y., Sharma, Y., Zhang, H., Yi, J., and Hsieh, C.J. Ead: elasticnet attacks to deep neural networks via adversarial examples. In ThirtySecond AAAI Conference on Artificial Intelligence, 2018b.
 Chen & Guestrin (2016) Chen, T. and Guestrin, C. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794. ACM, 2016.
 Cheng et al. (2018) Cheng, M., Yi, J., Zhang, H., Chen, P.Y., and Hsieh, C.J. Seq2sick: Evaluating the robustness of sequencetosequence models with adversarial examples. arXiv preprint arXiv:1803.01128, 2018.
 Cheng et al. (2019) Cheng, M., Le, T., Chen, P.Y., Zhang, H., Yi, J., and Hsieh, C.J. Queryefficient hardlabel blackbox attack: An optimizationbased approach. In International Conference on Learning Representations, 2019.
 Dorogush et al. (2018) Dorogush, A. V., Ershov, V., and Gulin, A. Catboost: gradient boosting with categorical features support. arXiv preprint arXiv:1810.11363, 2018.
 Feng et al. (2018) Feng, J., Yu, Y., and Zhou, Z.H. Multilayered gradient boosting decision trees. In Neural Information Processing Systems, 2018.
 Friedman et al. (2000) Friedman, J., Hastie, T., Tibshirani, R., et al. Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). The Annals of Statistics, 28(2):337–407, 2000.
 Friedman (2001) Friedman, J. H. Greedy function approximation: a gradient boosting machine. The Annals of Statistics, 29(5):1189–1232, 2001.
 Friedman (2002) Friedman, J. H. Stochastic gradient boosting. Computational Statistics & Data Analysis, 38(4):367–378, 2002.
 Goodfellow et al. (2015) Goodfellow, I., Shlens, J., and Szegedy, C. Explaining and harnessing adversarial examples. In International Conference on Learning Representations, 2015.
 Guo et al. (2017) Guo, C., Rana, M., Cisse, M., and van der Maaten, L. Countering adversarial images using input transformations. arXiv preprint arXiv:1711.00117, 2017.
 Huang et al. (2017) Huang, S., Papernot, N., Goodfellow, I., Duan, Y., and Abbeel, P. Adversarial attacks on neural network policies. arXiv preprint arXiv:1702.02284, 2017.
 Ilyas et al. (2017) Ilyas, A., Engstrom, L., Athalye, A., and Lin, J. Queryefficient blackbox adversarial examples. arXiv preprint arXiv:1712.07113, 2017.
 Ilyas et al. (2019) Ilyas, A., Santurkar, S., Tsipras, D., Engstrom, L., Tran, B., and Madry, A. Adversarial examples are not bugs, they are features. arXiv preprint arXiv:1905.02175, 2019.
 Kantchelian et al. (2016) Kantchelian, A., Tygar, J., and Joseph, A. Evasion and hardening of tree ensemble classifiers. In International Conference on Machine Learning, pp. 2387–2396, 2016.
 Ke et al. (2017) Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., and Liu, T.Y. LightGBM: A highly efficient gradient boosting decision tree. In Neural Information Processing Systems, pp. 3146–3154, 2017.
 Kos et al. (2018) Kos, J., Fischer, I., and Song, D. Adversarial examples for generative models. In 2018 IEEE Security and Privacy Workshops (SPW), pp. 36–42. IEEE, 2018.
 Kurakin et al. (2017) Kurakin, A., Goodfellow, I., and Bengio, S. Adversarial machine learning at scale. In International Conference on Learning Representations, 2017.
 Liu & Hsieh (2019) Liu, X. and Hsieh, C.J. Robgan: Generator, discriminator, and adversarial attacker. In IEEE conference on Computer Vision and Pattern Recognition, 2019.
 Liu et al. (2019) Liu, X., Li, Y., Wu, C., and Hsieh, C.J. Advbnn: Improved adversarial defense through robust bayesian neural network. In International Conference on Learning Representations, 2019.
 Ma et al. (2018) Ma, X., Li, B., Wang, Y., Erfani, S. M., Wijewickrema, S., Houle, M. E., Schoenebeck, G., Song, D., and Bailey, J. Characterizing adversarial subspaces using local intrinsic dimensionality. arXiv preprint arXiv:1801.02613, 2018.
 Madry et al. (2018) Madry, A., Makelov, A., Schmidt, L., Tsipras, D., and Vladu, A. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, 2018.
 Papernot et al. (2016a) Papernot, N., McDaniel, P., and Goodfellow, I. Transferability in machine learning: from phenomena to blackbox attacks using adversarial samples. arXiv preprint arXiv:1605.07277, 2016a.
 Papernot et al. (2016b) Papernot, N., McDaniel, P., Wu, X., Jha, S., and Swami, A. Distillation as a defense to adversarial perturbations against deep neural networks. In 2016 IEEE Symposium on Security and Privacy (SP), pp. 582–597. IEEE, 2016b.
 Peter et al. (2017) Peter, S., Diego, F., Hamprecht, F. A., and Nadler, B. Cost efficient gradient boosting. In Neural Information Processing Systems, 2017.
 Quinlan (1986) Quinlan, J. R. Induction of decision trees. Machine Learning, 1(1):81–106, 1986.
 Raghunathan et al. (2018) Raghunathan, A., Steinhardt, J., and Liang, P. Certified defenses against adversarial examples. arXiv preprint arXiv:1801.09344, 2018.
 Samangouei et al. (2018) Samangouei, P., Kabkab, M., and Chellappa, R. DefenseGAN: Protecting classifiers against adversarial attacks using generative models. arXiv preprint arXiv:1805.06605, 2018.
 Si et al. (2017) Si, S., Zhang, H., Keerthi, S. S., Mahajan, D., Dhillon, I. S., and Hsieh, C.J. Gradient boosted decision trees for high dimensional sparse output. In International Conference on Machine Learning, 2017.
 Sinha et al. (2018) Sinha, A., Namkoong, H., and Duchi, J. Certifying some distributional robustness with principled adversarial training. In International Conference on Learning Representations, 2018.
 Song et al. (2017) Song, Y., Kim, T., Nowozin, S., Ermon, S., and Kushman, N. Pixeldefend: Leveraging generative models to understand and defend against adversarial examples. arXiv preprint arXiv:1710.10766, 2017.
 Szegedy et al. (2013) Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., and Fergus, R. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
 Tsipras et al. (2019) Tsipras, D., Santurkar, S., Engstrom, L., Turner, A., and Madry, A. Robustness may be at odds with accuracy. International Conference on Learning Representations, 2019.
 Tyree et al. (2011) Tyree, S., Weinberger, K. Q., Agrawal, K., and Paykin, J. Parallel boosted regression trees for web search ranking. In Proceedings of the 20th International Conference on World Wide Web, pp. 387–396. ACM, 2011.
 Wang et al. (2018) Wang, S., Chen, Y., Abdou, A., and Jana, S. Mixtrain: Scalable training of formally robust neural networks. arXiv preprint arXiv:1811.02625, 2018.
 Wong & Kolter (2018) Wong, E. and Kolter, J. Z. Provable defenses against adversarial examples via the convex outer adversarial polytope. In International Conference on Machine Learning, 2018.
 Wong et al. (2018) Wong, E., Schmidt, F., Metzen, J. H., and Kolter, J. Z. Scaling provable adversarial defenses. In Neural Information Processing Systems, 2018.
 Xu et al. (2017) Xu, W., Evans, D., and Qi, Y. Feature squeezing: Detecting adversarial examples in deep neural networks. arXiv preprint arXiv:1704.01155, 2017.
 Xu et al. (2019) Xu, Z. E., Kusner, M. J., Weinberger, K. Q., and Zheng, A. X. Gradient regularized budgeted boosting. arXiv preprint arXiv:1901.04065, 2019.
 Zhang et al. (2018) Zhang, H., Si, S., and Hsieh, C.J. GPUacceleration for largescale tree boosting. SysML Conference, 2018.
 Zhang et al. (2019) Zhang, H., Yu, Y., Jiao, J., Xing, E. P., Ghaoui, L. E., and Jordan, M. I. Theoretically principled tradeoff between robustness and accuracy. arXiv preprint arXiv:1901.08573, 2019.
 Zhou & Feng (2017) Zhou, Z.H. and Feng, J. Deep forest: Towards an alternative to deep neural networks. In International Joint Conferences on Artificial Intelligence, 2017.
Here we prove Theorem 1 for information gain score.
Proof.
and are defined as
and
For simplicity, we denote , , and . The information gain of this split can be written as a function of and :
(5) 
where and are constants with respect to . Taking as a continuous variable, we have
(6) 
When , perturbing one example in with label 0 to will increase and decrease the information gain. It is easy to see that if and only if This indicates that when and , perturbing one example with label 0 to will always decrease the information gain. ∎
Similarly, if and , perturbing one example in with label 1 to will decrease the information gain. As mentioned in the main text, to decrease the information gain score in Algorithm 1, the adversary needs to perturb examples in such that and are close to each other. Algorithm 3 gives an method to find and , the optimal number of points in with label 0 and 1 to be added to the left.
We also have a theorem for Gini impurity score similar to Theorem 1.
Theorem B.1.
If and , perturbing one example in with label 0 to will decrease the Gini impurity.
Proof.
The Gini impurity score of a split with threshold on feature is
(7) 
where we use the same notation as in (5). and are constants with respect to . Taking as a continuous variable, we have
(8) 
where and . Then holds if , which is equivalent to ∎
Since the conditions of Theorem 1 and Theorem B.1 are the same, Algorithm 1 and Algorithm 3 also work for treebased models using Gini impurity score.
Figure 4 shows the decision boundaries and test accuracy of natural trees as well as robust trees with different values on two dimensional synthetic datasets. All trees have depth 5 and we plot training examples in the figure. The results show that the decision boundaries of our robust decision trees are simpler than the decision boundaries in natural decision trees, agreeing with the regularization argument in the main text.
In Tables 4 and 5 we present the and distortions of vanilla (information gain based) decision trees and GBDT models obtained by Kantchelian’s and attacks. Again, only small or medium sized binary classification models can be evaluated by Kantchelian’s attack. From the results we can see that although our robust decision tree training algorithm is designed for perturbations, it can also improve models and robustness significantly.
Figure 5 shows the distortion and accuracy of FashionMNIST GBDT models with different number of trees. In Table 7 we present the test accuracy and distortion of models with different number of trees obtained by Cheng’s attack. For each dataset, models are generated during a single boosting run. We can see that the robustness of robustly trained models consistently outperforms that of natural models with the same number of trees. Another interesting finding is that for MNIST and FashionMNIST datasets in Figures 3 (in the main text) and 5, models with more trees are generally more robust. This may not be true in other datasets; for example, results from Table 7 in the Appendix shows that on some other datasets, the natural GBDT models lose robustness when more trees are added.
One might hope that one can simply reduce the depth of trees to improve robustness since shallower trees provide stronger regularization effects. Unfortunately, this is not true. As demonstrated in Figure 6, the robustness of naturally trained GBDT models are much worse when compared to robust models, no matter how shallow they are or how many trees are in the ensemble. Also, when the number of trees in the ensemble model is limited, reducing tree depth will significantly lower the model accuracy.
We test our robust training framework on random forest (RF) models and our results are in Table 6. In these experiments we build random forest models with 0.5 data sampling rate and 0.5 feature sampling rate. We test the robust and natural random forest model on three datasets and in each dataset, we tested 100 points using Cheng’s and Kantchelian’s attacks. From the results we can see that our robust decision tree training framework can also significantly improve random forest model robustness.
In Figure 7 we present more adversarial examples for MNIST and FashionMNIST datasets using GBDT models.


Dataset 
training  test  # of  # of  robust  depth  test acc. 



set size  set size  features  classes  robust  natural  robust  natural  robust  natural  robust  natural  

