Supervising Feature Influence
Abstract
Causal influence measures for machine learnt classifiers shed light on the reasons behind classification, and aid in identifying influential input features and revealing their biases. However, such analyses involve evaluating the classifier using datapoints that may be atypical of its training distribution. Standard methods for training classifiers that minimize empirical risk do not constrain the behavior of the classifier on such datapoints. As a result, training to minimize empirical risk does not distinguish among classifiers that agree on predictions in the training distribution but have wildly different causal influences. We term this problem covariate shift in causal testing and formally characterize conditions under which it arises. As a solution to this problem, we propose a novel active learning algorithm that constrains the influence measures of the trained model. We prove that any two predictors whose errors are close on both the original training distribution and the distribution of atypical points are guaranteed to have causal influences that are also close. Further, we empirically demonstrate with synthetic labelers that our algorithm trains models that (i) have similar causal influences as the labeler’s model, and (ii) generalize better to outofdistribution points while (iii) retaining their accuracy on indistribution points.
Supervising Feature Influence
Shayak Sen, Piotr Mardziel, Anupam Datta, Matthew Fredrikson Carnegie Mellon University
1 Introduction
Data processors employing machine learning algorithms are increasingly being required to provide and account for reasons behind their predictions due to regulations such as the EU GDPR [?]. This call for reasoning tools has intensified with the increasing use of machine learning systems in domains like criminal justice [?], credit [?], and hiring [?]. Understanding the reasons behind prediction by measuring the importance or influence of attributes for predictors has been an important area of study in machine learning. Traditionally, influence measures were used to inform feature selection [?]. Recently, influence measures have received renewed interest as part of a toolbox to explain operations and reveal biases of inscrutable machine learning systems [?; ?; ?; ?; ?].
Causal influence measures are a particularly important constituent of this toolbox [?; ?]. By identifying attributes that directly affect decisions, they provide insight about the operation of complex machine learning systems. In particular, they enable identification of principal reasons for decisions (e.g., credit denials) by evaluating counterfactual queries that ask whether changing input attributes would produce a change in the decision. This determination is used to explain and guard against unjust biases. For example, the use of a correlate of age like income to make credit decisions may be justified even if it causes applicants of one age group to be approved at a higher rate than another whereas the direct use of age or a correlate like zipcode may not be justified^{1}^{1}1This is an example of a “business necessity defense” under US law on disparate impact [?]..
Causal analyses of natural systems often involve observing outcomes of specially created units, e.g. mice with genes altered. Such units may be atypical in natural populations. However, while performing causal analysis over machine learnt systems, a similar approach encounters an important challenge: machine learning systems are not expected to be evaluated on atypical or outofdistribution units (datapoints), since they have not been exposed to such units during training. Standard methods for training classifiers that minimize empirical risk do not constrain the behavior of the classifier on such datapoints. As a result, training to minimize empirical risk does not distinguish among classifiers that agree on predictions in the training distribution but have unintended causal influences. We term this problem covariate shift in causal testing. In other words, typical machine learning algorithms are designed to make the right predictions but not necessarily for justifiable reasons.
Returning to the example of credit decisions using age and income, consider a situation where the two are strongly correlated: young individuals have low income and older individuals have a higher income. This situation is illustrated in Figure 0(a) where all three predictors have low predictive error, but they make similar predictions for very different reasons. Since the three predictors have nearly identical predictions on the distribution, points from the distribution are not useful in distinguishing the causal influence of the two features. As a result, causal testing requires the creation of atypical units that break the correlations between features. For example, evaluating the predictor on the points on the red bar (Figure 0(b)) where age is fixed and income is varied informs whether income is used by a given predictor or not. However, since from an empirical risk minimization perspective the atypical points are irrelevant, an algorithm optimizing just for predictive accuracy is free to choose any of the three predictors.
We formally characterize conditions that give rise to covariate shift in causal testing (Theorem 1). Intuitively, this result states that if the units used for measuring causal influence are sufficiently outside the data distribution, constraining the behavior of a predictor on the data distribution does not constrain the causal influences of the predictor.
In order to address this issue, we introduce an active learning algorithm in Section 4. This algorithm provides an accountability mechanism for data processors to examine important features, and if their influences are suspicious, to collect additional information that constrains the feature influences. This additional information could steer the influences toward more acceptable values (e.g., by reducing the influence of age in in Figure 0(a)). Alternatively, it could provide additional evidence that the influence values convey appropriate predictive power and the suspicions are unfounded (e.g., by preserving influences in in Figure 0(a)).
The active learning process is assisted by two oracles. The first is a feature selection oracle that examines the causal influences of different features, and chooses the feature for which counterfactuals queries should be answered. We envision this oracle to be an auditor who can identify problematic causal influences based on background knowledge of causal factors or ethical norms governing classification. The second is similar to a standard active learning oracle, and labels atypical points to answer counterfactual queries. For example, for predictor in our running example, the feature selection oracle might notice that age has an unduly high influence, and can instruct the algorithm to focus on instances that vary age while keeping income fixed. While the direct use of age may be obviously problematic, in common applications the system designer may not have apriori knowledge of which attribute uses are problematic. The feature selection oracle may be able to spot suspiciously high or low influences and guide the counterfactual queries that get sent to the labeler to better inform the learning.
We evaluate the counterfactual active learning algorithm for linear, decision tree, and random forest models on a number of datasets, using a synthetic labeler. In particular, we demonstrate that after counterfactual active learning, the trained classifier has similar causal influence measures to the labeler. We also show that the classifier can generalize better to outofdistribution points. This is an important consequence of having causal behavior similar to the labeler. Finally, we demonstrate that the accuracy on the data distribution does not degrade as a result of this additional training.
Related Work.
Prior work on causal learning learns the structure of causal models [?; ?], or given the structure of models, the functional relationship between variables. In this context, active learning has been used to aid both the discovery of causal structures [?; ?] and their functional relationships [?]. In this work we don’t attempt to learn true causal models. Instead, our work focuses on constraining the causal behavior of learnt models. In doing so, we provide an accountability mechanism for data processors to collect additional data that guides the causal influences of their models to more acceptable values or justifies the causal influences of the learnt model.
Contributions.
In summary, the contributions of this paper are as follows.

A formal articulation of the covariate shift in causal testing problem.

A novel active learning algorithm that addresses the problem.

An empirical evaluation of the algorithm for standard machine learning predictors on a number of realworld datasets.
2 Background
A predictor is a function that operates on an input space to a space of predictions . The input space has a probability distribution associated with it, where is the frequency of drawing a particular instance .
2.1 Risk Minimization
Given random variables , and , and a loss function , the risk associated with predictor is given by
The goal of supervised learning algorithms under a risk minimization paradigm is to minimize . In general, the distributions over and are unknown. As a result, learning algorithms minimize empirical risk over a sample
Note that the risk minimization paradigm only constrains the behavior of a predictor on points from the distribution and treats any two predictors that have identical behavior on points from the distribution interchangeably.
For ease of presentation, we focus on binary classification tasks where is binary, and use the loss function .
2.2 Counterfactual Influence
The influence of a feature for a predictor is measured by comparing the outcomes of on the data distribution to the outcomes of a counterfactual distribution that changes the value of . We denote the data distribution over features as and the counterfactual distribution with respect to feature as .
A number of influence measures proposed in prior work can be viewed as instances of this general idea. For example, Permutation Importance [?], measures the difference in accuracy between and , where is chosen as randomly permuted. In [?], is chosen as the minimal perturbation of such that feature cannot be predicted. In this paper, we use Average Unary QII (auQII), an instance of Quantitative Input Influence [?], as our causal influence measure. The counterfactual distribution for auQII is represented as , where the random variable represents features except where is sampled from the marginal distribution of independently of the rest of the features .
Definition 1.
Given a model , the Average Unary QII (auQII) of an input , written , is defined as
3 Covariate shift in Causal Testing
In this section, we discuss some of the theoretical implications of the covariate shift in causal testing. First, we show in Theorem 1 that risk minimization does not constrain influences when the data distribution diverges significantly from the counterfactual distribution. In other words, predictors trained under an ERM regime are free to choose influential factors. Further, in Theorem 2, we demonstrate predictors that agree on predictions on both the data distribution and the counterfactual distribution have similar influences. This theorem forms the motivation for our counterfactual active learning algorithm presented in Section 4 that attempts to minimize errors on both the data and the counterfactual distribution by adding points from the counterfactual distribution to the training set.
3.1 Counterfactual divergence
We first define what it means for an influence measure to be unconstrained by its behavior on the data distribution. An influence measure is unconstrained for a predictor , and data distribution , if it is possible to find a predictor which has similar predictions on the data distribution but very different influences. More specifically, if the influence is high, then it can be reduced to a lower value, and vice versa.
Definition 2.
An influence measure is said to be unconstrained, for , for a predictor , if there exists predictors such that for , , and and .
The following theorem shows that if there exist regions in the input space with low probability weight in the data distribution and high weight in the counterfactual distribution, i.e. the data distribution and counterfactual distribution diverge significantly, then any model will have unconstrained causal influences. As a result, predictors trained under an ERM regime are free to choose influential causal factors.
Theorem 1.
If there exists a predicate , such that and , then for any , is unconstrained.
Proof.
The proof proceeds via an averaging argument. Let be the set of all functions from to . For consider sampled uniformly from the set of deterministic functions that map values satisfying to and according to some otherwise: . Notice that is therefore uniform in .
As when , any such classifier satisfies . Computing the expected influence over all , we have
Let . Then,  
By an averaging argument, there exists an such that . Similarly, computing the expected influence over all , we have
Let . Then,  
Again, by an averaging argument, there exists an such that ∎
3.2 Relating counterfactual and true accuracies
We now show that if the two models agree on both the true and the counterfactual distributions, then they have similar influences.
Definition 3.
Given a loss function and predictors and , the expected loss of the with respect to , written , is
Theorem 2.
If , and , then
Proof.
by triangle inequality  
by triangle inequality  
∎
4 Counterfactual Active Learning
In this section, we describe an active learning algorithm for training a model that pushes the model towards the desired causal influences. The learning is assisted by two oracles. The first is a feature selection oracle that examines the causal influences of input features, and chooses the feature for which counterfactuals should be labeled. We envision this oracle to be a domain expert that can identify problematic causal influences based on background knowledge of causal factors or ethical norms governing the classification task. The second oracle, similar to a standard active learning oracle, labels counterfactual points with their intended label.
The active learning process (Algorithm 1), on every iteration, computes the influences of features of a classifier trained on the dataset. The feature selection oracle picks a feature. Then points are picked from the counterfactual distribution, where is a prespecified batch size parameter. The parameter can also be thought of as a learning rate for the algorithm. The points in are then labeled by the oracle and added to the training set. A new classifier is trained on the augmented dataset and this process is repeated until the stopping condition is reached. The stopping condition can either be a prespecified number of iterations or a convergence condition when the classifier learnt does not show a significant change in influences.
The choice of the feature selection oracle affect the speed of convergence of the algorithm. In our experiments, we consider two feature selection oracles (i) a baseline oracle, that picks features at random for generating counterfactual queries, (ii) a oracle, that picks the feature that has the highest difference in influence from the true influence. In Section 5.2, we demonstrate that an oracle that deterministically picks the feature with the highest difference in influence converges faster than an oracle that picks a feature at random.
The rationale for training the classifier on points from the counterfactual distribution is twofold. First, by adding points from the counterfactual distribution, the algorithm reduces the divergence between the training distribution and the counterfactual distribution, as a result, constraining the feature influences of the learnt classifier according to Theorem 1. Additionally, by increasing accuracy of the classifier with respect to the labeler on the counterfactual distribution, the influences of the trained classifier are pushed closer to the influence of the labeler (Theorem 2).
5 Evaluation
In this section, we evaluate the counterfactual active learning algorithm for linear, decision tree, and random forest models using a synthetic labeler as ground truth. In particular, we demonstrate that after counterfactual active learning, the trained classifier has similar causal influence measures to the labeler. We also show that the classifier can generalize better to outofdistribution points. This is an important consequence of having causal behavior similar to the labeler. And finally, we demonstrate that the accuracy on the data distribution does not degrade as a result of this additional training.
5.1 Methodology
We evaluate our algorithm by training two predictors. The first predictor provides the ground truth to be used by the oracles. The second predictor is trained on a biased version of the dataset used to train the first predictor. This approach induces a difference in influence in the two predictors. In more detail, the following steps comprise our experimental methodology.

Given: a dataset which is a sample from the original distribution,

Train ground truth model on . This model is used by the labeling oracle to respond to counterfactual queries.

Select a random predicate .

Construct by excluding points from that satisfy .

Train predictor on a random subset of , leaving the rest for testing and for use by the baseline described below.

Perform counterfactual active learning on predictor .
The random predicate is chosen by training a short decision tree on the dataset with random labels. is intended to simulate a biased data collection mechanism in order to induce artificial correlations in the dataset. The induced artificial correlations create a gap between the counterfactual distribution and the data distribution, thus making the feature influences unconstrained.
We run the active learning algorithm under the following settings:

: At each iteration, the feature selection oracle selects the feature with the highest difference in auQII with respect to the base model.

: This is a baseline where the feature selection oracle selects a feature at random.

: This is another baseline where the labeling oracle labels fresh points from as opposed to from the counterfactual distribution.
All experiments presented here are run on the benchmark adult dataset [?] with a batch size of for epochs, and averaged over runs of the algorithm.
5.2 Results
Figure 2 shows the evolution of the active learning algorithm on the adult dataset with a logistic regression model. In particular, Figure 1(a) shows the the change of the mean square error of auQII between and . This figure shows that the feature influences converge to values close to that of with the oracle. The oracle also converges but at a slower rate. This result is useful as it indicates that the process does not require the feature selection oracle to pick optimally. The algorithm does not affect the influence measures significantly. This is to be expected since it retrains using labeled points from within the biased distribution.
In Figures 1(b) and 1(c), we show the accuracy of the classifier on holdout sets from and respectively. Figure 1(b) shows that the error on the data distribution does not increase due to this additional training. Further, Figure 1(c) shows that the error on the unbiased dataset also decreases, even though parts of are not in the training set. This is a sideeffect of the model becoming causally closer to the ground truth model.
6 Conclusion and Future Work
We articulate the problem of covariate shift in causal testing and formally characterize conditions under which it arises. We present an algorithm for counterfactual active learning that addresses this problem. We empirically demonstrate with synthetic labelers that our algorithm trains models that (i) have similar causal influences as the labeler’s model, and (ii) generalize better to outofdistribution points while (iii) retaining their accuracy on indistribution points.
In this paper, we assume that the labeling oracle can label all points with equal certainty and cost. However, for points further from the distribution, the labeler might need to perform real experiments in order to label the points. This suggests two interesting directions for future work. The first studies mechanisms for answering counterfactual queries for points far away from the distribution. The second involves designing an algorithm that takes into account the cost of a labeler in the learning process.
Acknowledgments
This work was developed with the support of NSF grants CNS1704845 as well as by the Air Force Research Laboratory under agreement number FA95501710600. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes not withstanding any copyright notation thereon. The views, opinions, and/or findings expressed are those of the author(s) and should not be interpreted as representing the official views or policies of the Air Force Research Laboratory, the National Science Foundation, or the U.S. Government.
References
 [Adler et al., 2018] Philip Adler, Casey Falk, Sorelle A. Friedler, Tionney Nix, Gabriel Rybeck, Carlos Scheidegger, Brandon Smith, and Suresh Venkatasubramanian. Auditing blackbox models for indirect influence. Knowledge and Information Systems, 54(1):95–122, Jan 2018.
 [Angwin et al., 2016] Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. Machine bias: There’s software used across the country to predict future criminals. and itâs biased against blacks. ProPublica, May 2016.
 [Breiman, 2001] Leo Breiman. Random forests. Mach. Learn., 45(1):5–32, October 2001.
 [Burger, 1971] Warren Burger. Griggs v. duke power company. Opinion of the United States Supreme Court, March 1971.
 [Byrnes, 2017] Nanette Byrnes. An aifueled credit formula might help you get a loan. ProPublica, 2017.
 [Datta et al., 2016] Anupam Datta, Shayak Sen, and Yair Zick. Algorithmic Transparency via Quantitative Input Influence: Theory and Experiments with Learning Systems. In Proceedings of IEEE Symposium on Security & Privacy 2016, 2016.
 [Datta et al., 2017] Anupam Datta, Matthew Fredrikson, Gihyuk Ko, Piotr Mardziel, and Shayak Sen. Proxy nondiscrimination in datadriven systems. Technical report, arXiv, July 2017.
 [European Commission, 2016] European Commission. General data protection regulation (GDPR). Regulation (EU) 2016/679, L119, May 2016.
 [He and Geng, 2008] Yangbo He and Zhi Geng. Active learning of causal networks with intervention experiments and optimal designs. 9:2523–2547, 11 2008.
 [Hyttinen et al., 2013] Antti Hyttinen, Frederick Eberhardt, and Patrik O. Hoyer. Experiment selection for causal discovery. Journal of Machine Learning Research, 14:3041–3071, 2013.
 [Ideal Inc., 2017] Ideal Inc. How ai can stop unconscious bias in recruiting. https://ideal.com/unconsciousbias/, 2017. Accessed Nov. 22, 2017.
 [Kilbertus et al., 2017] N. Kilbertus, M. RojasCarulla, G. Parascandolo, M. Hardt, D. Janzing, and B. Schölkopf. Avoiding discrimination through causal reasoning. In Advances in Neural Information Processing Systems 30, pages 656–666. Curran Associates, Inc., 2017.
 [Lichman, 2013] M. Lichman. UCI machine learning repository, 2013.
 [Ribeiro et al., 2016] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ”why should i trust you?”: Explaining the predictions of any classifier. In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, pages 1135–1144, New York, NY, USA, 2016. ACM.
 [Rubenstein et al., 2017] P. K. Rubenstein, I. Tolstikhin, P. Hennig, and B. Schölkopf. Probabilistic active learning of functions in structural causal models. 2017.
 [Spirtes et al., 2000] P. Spirtes, C. Glymour, and R. Scheines. Causation, Prediction, and Search. MIT press, 2nd edition, 2000.
 [Tong and Koller, 2001] Simon Tong and Daphne Koller. Active learning for structure in bayesian networks. In Proceedings of the 17th International Joint Conference on Artificial Intelligence  Volume 2, IJCAI’01, pages 863–869, San Francisco, CA, USA, 2001. Morgan Kaufmann Publishers Inc.