Abstract
Predictive models are being increasingly used to support consequential decision making at the individual level in contexts such as pretrial bail and loan approval. As a result, there is increasing social and legal pressure to provide explanations that help the affected individuals not only to understand why a prediction was output, but also how to act to obtain a desired outcome. To this end, several works have proposed optimizationbased methods to generate nearest counterfactual explanations. However, these methods are often restricted to a particular subset of models (e.g., decision trees or linear models) and differentiable distance functions. In contrast, we build on standard theory and tools from formal verification and propose a novel algorithm that solves a sequence of satisfiability problems, where both the distance function (objective) and predictive model (constraints) are represented as logic formulae. As shown by our experiments on realworld data, our algorithm is: i) modelagnostic ({non}linear, {non}differentiable, {non}convex); ii) datatypeagnostic (heterogeneous features); iii) distanceagnostic (, and combinations thereof); iv) able to generate plausible and diverse counterfactuals for any sample (i.e., 100% coverage); and v) at provably optimal distances.
ModelAgnostic Counterfactual Explanations
for Consequential Decisions
AmirHossein Karimi &Gilles Barthe &Borja Balle &Isabel Valera
MPIIS &MPISP/IMDEA Software Institute & &MPIIS
1 Introduction
Datadriven predictive models are ubiquitously being used to support or even substitute humans in decision making in a wide variety of realworld contexts including, e.g., selection process for hiring, loan approval, or pretrial bail. However, as algorithmic methods are increasingly used to make consequential decisions at the individuallevel – i.e., decisions that may have significant consequences for the individuals they decide about – the debate about their lack of transparency and explainability becomes more heated. To make things worse, while the verdict is still out as to what constitutes a good explanation [doshi2017towards, freitas2014comprehensible, kodratoff1994comprehensibility, lipton2018mythos, rudin2018please, ruping2006learning], there already exists clearly defined legal requirements for explanations in the context of consequential decision making. For example, the EU General Data Protection Regulation (“GDPR”) grants individuals the righttoexplanation [voigt2017eu, wachter2017right], via requiring institutions to provide explanations to individuals that are subject to their (semi)automated decision making systems.
A growing number of works on interpretable machine learning have recently focused on the definitions of, and mechanisms for providing, good explanations for predictorbased decision making systems. In the context of consequential decision making, it is widely agreed that a good explanation should provide answers to the following two questions [doshi2017towards, gunning2019darpa, wachter2017counterfactual]: (i) “why the model outputs a certain prediction for a given individual?”; and, (ii) “what features describing the individual would need to change to achieve the desired output?”
Here, we focus on answering the second question, or equivalently, on generating counterfactual explanations. Of specific importance is the problem of finding the nearest counterfactual explanation – i.e., identifying the set of features resulting in the desired prediction while remaining at minimum distance from the original set of features describing the individual. Existing approaches tackling this problem suffer from various limitations: they either propose solutions that are tailored to particular models, e.g., decision trees [tolomei2017interpretable]; rely on classical optimization tools, thus being restricted to convex predictive models and distances [russell2019efficient, ustun2019actionable]; or, solve a relaxed version of the original optimization problem using gradientbased approaches, thus being restricted to differentiable models and distance functions [wachter2017counterfactual] and lacking optimality guarantees. Additionally, it is important to consider that in the context of consequential decisionmaking, the features describing individuals are semantically meaningful and heterogeneous (i.e., mixed continuous & discrete); and can either be acted upon (e.g., bank account balance), or immutable and should be safeguarded from change (e.g., sex, race). A good explanation should account for these semantics (i.e., be plausible^{1}^{1}1We emphasize that while our formulation for generating counterfactuals seems similar to that of adversarial perturbations (image domain), the goals are different: while our goal is to provide actionable and plausible counterfactuals, the goal of adversarial examples is to be imperceptible to humans and hence plausible in the humanperception space, but not in the data space.) to be useful for the individual, a requirement that most existing approaches fail to address.
Our contributions. In this paper, we propose a modelagnostic approach to generate nearest counterfactual explanations, namely MACE, under any given distance function (or convex combinations thereof); while, at the same time, easily supporting additional plausibility constraints. Moreover, our approach readily encodes natural notions of distance for heterogeneous feature spaces, which are common in consequential decision making systems (e.g., loan approval) and consist of mixed numerical (e.g., age and income) and nominal features (e.g., gender and education level). To this end, in MACE we map the nearest counterfactual problem into a sequence of satisfiability () problems, by expressing both the predictive model and the distance function (as well as the plausibility and diversity constraints) as logic formulae. Each of these satisfiability problems aims to verify if there exists a counterfactual explanation at a distance smaller than a given threshold, and can be solved using standard SMT (satisfiability modulo theories) solvers. Moreover, we rely on a binary search strategy on the distance threshold to find an approximation to the nearest (plausible) counterfactual with an arbitrary degree of accuracy, and a lower bound on distance such that no counterfactual provably exists at a smaller distance. Finally, once nearest counterfactuals are found, diversity constraints may be added to the satisfiability problems to find alternative counterfactuals. The overall architecture of MACE is illustrated in Figure 1.
Our experimental validation on realworld datasets show that MACE not only achieves 100% coverage by design, but also generates explanations that are significantly closer than previous approaches [tolomei2017interpretable, ustun2019actionable]. We also provide qualitative examples showcasing the flexibility of our approach to generate actionable counterfactuals by extending our plausibility constraints to restrict changes to a subset of (nonimmutable) features. The Python implementation of our algorithms and the datasets used in our experiments are available at https://github.com/???.
2 Firstorder predicate logic
In this section, we briefly recall basic concepts of firstorder predicate logic, which MACE builds upon. We distinguish between function symbols (for instance, addition and multiplication ) and predicate symbols (for instance, equality or lesser than ). Function symbols are used to build expressions, and predicate symbols are used to build atomic formulae. Examples of valid expressions are , , and . Examples of valid atomic formulae are , or . A (quantifierfree) formula is a Boolean combination of atomic formulae. That is, a formula is built from atomic formulae using conjunction , disjunction , and negation . Formulae have an interpretation over their intended domain. For instance, a formula about realvalued expressions has a natural interpretation as a subset of , where denotes the number of variables that appear in the formula. The interpretation is obtained by mapping every variable into a value, e.g., a real number. For example, belongs in the interpretation of since the mapping assigns true because . We say that a formula is satisfiable if its interpretation as a subset of is nonempty.
The satisfiability problem consists in checking whether or not a formula is satisfiable. Satisfiability problems can be verified automatically using satisfiability modulo theories (SMT) solvers like Z3 [DBLP:conf/tacas/MouraB08] or CVC4 [BCD+11]. We refer to [Kroening:2008] for an exposition of the basic algorithms used by SMT solvers. For the purpose of the next sections, it suffices to assume a given satisfiability oracle . For our experiments, we use offtheself SMT solvers to realize the oracle. We use SMT solvers as blackbox, but it is interesting to note that our formulae fall in the linear fragment of the theory of reals (i.e. all formulae that only contain expressions of degree 1 when viewed as multivariate polynomials over variables), which can be decided efficiently using the FourierMotzkin algorithm.
3 Counterfactual spaces for predictive models
This section defines a logical representation of counterfactual explanations for predictive models, which are functions mapping input feature vectors into decisions . ^{2}^{2}2While here we assume binary predictor models, i.e., classifiers, our approach generalizes to regression problems where and more generally any other output domain. Given a predictive model , we can define the set of counterfactual explanations for a (factual) input as . In words, contains all the inputs for which the model returns a prediction different from . We also remark that is the set of preimages of under .
For a broad class of predictive models, it is possible to construct counterfactual formulae capturing membership in . We do so by computing the characteristic formula of the model. For a predictive model , and pair of input and output values and , the characteristic formula verifies that is valid if and only if . Thus, given a factual input with and we define the counterfactual formula as
(1) 
Intuitively, the formula on the right hand side of (1) says that “ is a counterfactual for if either and , or and ”. It is thus clear from the definition that an input satisfies if and only if . Moreover, (1) shows that, to construct counterfactual formulae , we only require the characteristic formulae of the corresponding predictive models, , and the value of . To obtain such characteristic formulae we assume that predictive models are represented by programs in a core programming language with assignments, conditionals, sequential composition, syntactically bounded loops and return statements. This allows us to use techniques from the program verification literature. Specifically, we use the socalled predicate transformers [Dijkstra1968, hoare1969, Floyd1993, flanagan2001avoiding]. The description of the general procedure is provided in Appendix A. For ease of exposition, we illustrate the construction of characteristic formulae through two examples, a decision tree and a multilayer perceptron.
As a first example, consider the decision tree from Figure 1(a) which takes as input and returns a binary output in . Figure 1(b) provides the programming language description of this decision tree. To construct a formula representing the function computed by this tree we first build a clause for each leaf in the tree by taking the conjunction of all the conditions encountered in the path from the root to the leaf. For example, the clause corresponding to the leftmost leaf on the tree in Figure 1(a) is . Once all these clauses are constructed, the characteristic formula corresponding to the full tree is obtained by taking the conjunction of all said clauses, as shown in Figure 1(c).

As a second example we consider a feedforward neural network with one hidden layer followed by a ReLU activation function, as depicted in Figure 2(a). This model implements a function , where the binary decision is taken by thresholding the value of the last hidden node. The programming language representation of this model is given in Figure 2(b). In this case, the characteristic formula predicates over inputs , output and program variables and for each hidden node representing the values on that node before and after the nonlinear ReLU transformation, respectively. The characteristic formula is a conjunction, and each conjunct corresponds to one instruction of the program. For example, for the leftmost hidden node in the first layer of the network in Figure 2(a) the variable is associated with the clause ; and the variable corresponds to the value of after the ReLU, which can be written as the disjunction . For the output node – in this case, – we introduce a pair of clauses representing the thresholding operation, i.e. . Taking the conjunction of the formulas for each node we obtain the characteristic formula in Figure 2(c).

4 Finding the nearest counterfactual
Based on the counterfactual space defined in the previous section, we would like to produce counterfactual explanations for the output of a model on a given input by trying to find a nearest counterfactual, which is defined as:
(2) 
For the time being, we assume that a notion of distance between instances, , is given. For convenience, and without loss of generality, we also assume that takes values in the interval .
4.1 Main algorithm
Our goal now is to leverage the representation of in terms of a logic formula to solve (2). To this end, we map the optimization problem in (2) into a sequence of satisfiability problems, which can be verified or refuted by standard SMT solvers. We do so by first converting the expression , where , into a logic formula , which is valid if and only if . We assume here that the distance function is expressed by a program in the same language that we used to represent the models in Section 3. In particular, we can leverage the procedure detailed in Appendix A to automatically construct . Then, both the counterfactual formula and the distance formula are combined into the logic formula:
which is satisfiable if and only if there exists a counterfactual such that . To check whether the above formula is satisfiable we use the satisfiability oracle which returns either an instance such that is valid, or “unsatisfiable” if no such exists.
Note that, while the oracle allows us to verify if there exist counterfactual explanations at distance smaller or equal than a given threshold , solving optimization (2) requires finding a nearest counterfactual. To do so, we apply a binary search strategy on the distance threshold that allows us to find approximately nearest counterfactuals with a prespecified degree of accuracy. This is implemented in Algorithm 1, which for an accuracy parameter makes at most calls to and returns a counterfactual such that , where is some solution of the optimization problem in (2). This mild dependence on the accuracy allows Algorithm 1 to tradeoff finding arbitrarily accurate solutions of (2) with the number of calls made to the satisfiability oracle. Note that Algorithm 1 may also account for potential plausibility or diversity constraints (refer to next section for further details).
We remark here our approach to find nearest counterfactuals is agnostic to the details of the model and distance being used; the only requirement is that they must be expressable in a fairly general programming language. As a consequence, we can handle a wide variety of predictive models, including both differentiable – such as, logisitic regression and multilayer perceptron – and nondifferentiable predictive models – e.g., decision trees and random forest– as well as a wide variety of distance functions (refer to next section for further details). Moreover, the bound returned by Algorithm 1 provides a certificate that any solution to (2) must satisfy . This is because whenever returns “unsatisfiable” it does so by internally constructing a proof that the formula is not valid.
4.2 Distance, Plausibility, and Diversity
Next we discuss additional criteria in the form of logic clauses that guide the satisfiability problem towards generating a counterfactual explanation with desired properties.
Distance. We first discuss several forms for the distance function that can be used to define the notion of nearest counterfactual. To this end, we first remark that in consequential decision making the input feature space is often heterogeneous – for example, gender is categorical, education level is ordinal, and income is a numerical variable. We define an appropriate distance metric for every kind of variable in the input feature space of the model as:
where corresponds to the range of the feature and is used to normalize the distances for all input features, such that for all , independently on the feature type. By defining the distance vector (being the total number of input features), one can now write the distance between instances as:
(3) 
where is the norm of a vector, and such that^{3}^{3}3Constraints on the distance hyperparameters ensure that the overall distance . To this end, since , the hyperparameters must satisfy . . Intuitively, norm is used to restrict the number of features that changes between the initial instance and the generated counterfactual ; the norm is used to restrict the average change distance between and ; and norm is used to restrict maximum change across features. Any distance of this type can easily be expressed as a program.
Approach  Models  Data types  Distances  Plausibility  Optimal Distance 

Proposed (MACE)  tree, forest, lr, mlp  heterogeneous  ✓  ✓  
Minimum Observable (MO)    heterogeneous  ✓  x  
Feature Tweaking (FT) [tolomei2017interpretable]  tree, forest  heterogeneous  x  x  
Actionable Recourse (AR) [ustun2019actionable]  lr  numeric, binary  x  x 
Plausibility. Up to this point, we have only considered minimum distance as the only requirement for generating a counterfactual. However, this might result in unrealistic counterfactuals, such as e.g., decrease the age or change the gender of a loan applicant. To avoid unrealistic counterfactuals, one may introduce additional plausibility constraints in the optimization problem in Eq. (2). This is equivalent to adding a conjunction in the constraint formula in Algorithm 1 that accounts for any additional plausibility formulae , which ensure that: i) each feature in the counterfactual should be datatype and datarange consistent with the training data; and ii) only actionable features [ustun2019actionable] are changed in the resulting counterfactual;.
First, since here we are working with heterogeneous feature spaces, we require all the features in the counterfactual to be consistent in both the datatypes (categorical, ordinal, etc.) and the dataranges with the training data. In particular, if a categorical (ordinal) feature is onehot (thermometer) encoded to be used as input to the predictive model, e.g., a logistic regression classifier, we make sure that the generated counterfactual provides a valid onehot vector (thermometer) for such feature. Likewise, for any numerical feature we ensure that its value in the counterfactual falls into observed range in the original data used to train the predictive model.
Moreover, to account for a nonactionable/immutable feature , i.e., a feature whose value in the counterfactual explanation should match its initial value, we set to be . Similarly, we account for variables that only allow for increasing values by setting .
Diversity. Finally, one might be interested in generating a (small) set of diverse counterfactual explanations for the same instance . To this end, we iteratively call Algorithm 1 with a constraints formula that includes diversity clauses to ensure that the newly generated explanation is substantially different from all the previous ones. We can encode diversity by forcing that the distance between every pair of counterfactual explanations is greater than a given value. For example, we can take^{4}^{4}4Here is the th dimensions of the th counterfactual already generated. to restrict repetitive counterfactuals by enforcing subsequent counterfactuals to have 0norm distance at least from all previous counterfactuals.
5 Experiments
Adult  Credit  COMPAS  

tree  PFT  0  0  0  68  68  68  74  74  74 
forest  PFT  0  0  0  99  99  99  100  100  100 
lr  AR  18  0.4  100  100  100  100 
Adult  Credit  COMPAS  

tree  MACE () vs MO  47  80  70  67  66  47  1  5  5 
MACE () vs MO  47  81  72  67  96  94  1  5  5  
MACE () vs PFT  53  87  85  14  56  54  
MACE () vs PFT  53  97  96  15  55  54  
forest  MACE () vs MO  51  81  69  68  61  38  1  6  6 
MACE () vs MO  51  82  71  68  97  96  1  6  6  
MACE () vs PFT  53  84  81  4  28  27  
MACE () vs PFT  53  96  96  4  28  27  
lr  MACE () vs MO  62  92  86  80  82  80  3  8  6 
MACE () vs MO  62  93  88  80  82  81  3  6  6  
MACE () vs AR  3  89  39  67  10  38  
MACE () vs AR  5  91  42  71  10  38  
mlp  MACE () vs MO  60  92  91  77  85  91  1  3  3 
MACE () vs MO  60  93  93  77  96  96  1  3  3 
agechange  rel. dist. increase  agechange  rel. dist. increase  agechange  rel. dist. increase  

MACE ()  13.2  9.0  20.4  100.3  84.4  32.8 
MO  78.8  50.9  92.0  245.7  95.6  193.3 
In this section, we empirically demonstrate the main properties of MACE compared to existing approaches.
Datasets. We evaluate MACE at generating counterfactual explanations on three realworld datasets in the context of loan approval (Adult [adult_dataset] and Credit [yeh2009comparisons] datasets) and pretrial bail (COMPAS dataset [propublica_compas]). All the three datasets present heterogeneous input spaces.
Baselines. We compare the performance of MACE at generating the nearest counterfactual explanations with: the Minimum Observable (MO) approach,^{5}^{5}5Identical to the deployed What If tool by Google’s PAIR team, as reported here [accessed: Sept. 2019]: https://paircode.github.io/whatiftool/walkthrough.html which searches in the dataset for the closest sample that flips the prediction; the Feature Tweaking (FT) approach [tolomei2017interpretable], which searches for the nearest counterfactual lying close to the decision boundary of a Random Forest; and the Actionable Recourse (AR) [ustun2019actionable], which solves a mixed integer linear program to obtain counterfactual explanations for Linear Regression models. Table 1 summarizes the main properties of all the considered approaches to generate counterfactuals.
Metrics. To assess and compare the performance of the different approaches, we recall the criteria of good explanations for consequential decisions: i) the returned counterfactual should be as near as possible to the factual sample corresponding to the individual’s features; ii) the returned counterfactual must be plausible (refer to Section 4.2). Hence, we quantitatively compare the performance of MACE with the above approaches in terms of i) the normalized distance ; and ii) coverage indicating the percentage of factual samples for which the approach generates plausible (in type and range) counterfactuals.
Experimental setup. We consider as predictive models decision trees, random forest, logistic regression, and multilayer perceptron, which we train on the three datasets using the Python library scikitlearn [pedregosa2011scikit], with default parameters.^{6}^{6}6For the multilayer perceptron, we used two hidden layers with 10 neurons each to avoid overfitting. See Appendix B.1 for model selection details. Furthermore, to demonstrate the offtheshelf flexibility in the various setups described, we build MACE atop the opensource PySMT library [pysmt2015] with the Z3 [DBLP:conf/tacas/MouraB08] backend. In Appendix C.1, we provide a thorough empirical evaluation of the computational cost of the offtheshelf PySMT solver – including runtime comparisons between MACE and other baselines, – as well as a discussion on the choice of tradingoff arbitrarily accurate solutions of (2) with the number of calls made to the satisfiability oracle.
For each combination of approach, model, dataset, and distance, we generate the nearest counterfactual explanations for a heldout set of instances classified as negative by the corresponding model. Here we consider the , , norms as a measure of distance to identify the nearest counterfactuals. Unfortunately, we found that FT not once returned a plausible counterfactual. As a consequence, we modified the original implementation of FT, to ensure that the generated counterfactuals are plausible. The resulting Plausible Feature Tweaking (PFT) projects the set of candidate counterfactuals into a plausible domain before selecting the nearest counterfactual amongst them. This was not possible for AR because the approach only returns a single counterfactual, with no avail if it is not plausible.^{7}^{7}7 Importantly, Actionable Recourse does support actionability and datarange plausibility, however, it lacks support for datatype plausibility – Appendix B.3 describes the failure points of AR, as reported by the authors.
Coverage and distance results. Table 2 shows the coverage of all the approaches based only on datarange and datatype plausibility. Note that, since by definition both MACE and MO have coverage, we have not depicted these values in the table. In contrast, PFT fails to return counterfactuals for roughly of the Credit and COMPAS datasets, while both PFT and AR achieve minimal coverage on the Adult dataset.^{8}^{8}8The Adult dataset comprises a realistic mix of integer, realvalued, categorical, and ordinal variables common to consequential scenarios; further details in Appendix B.2. Focusing on those factual samples for which PFT and AR return plausible counterfactuals, we are able to compute the relative distance reductions achieved when using MACE as compared to other approaches, as shown in Table 3 (additionally, Figure 4 in Appendix B shows the distribution of the distance of the generated plausible counterfactual for all models, datasets, distances, and approaches). Here, we observe that MACE results in significantly closer counterfactual explanations than competing approaches, with an average decrease in distance of for Adult, for Credit, and for COMPAS. As a consequence, the counterfactuals generated by MACE would require significantly less effort on behalf of the affected individual in order to achieve the desired prediction.
Plausibility contraints.
While performing a qualitative analysis of generated counterfactuals we observed that many of them require changes in features that are often protected by law such as, age, race, and gender [barocas2016big]. As an example, for a trained random forest, the counterfactuals generated by both the MACE and MO approaches required individuals to change their age. Worse yet, for a substantial portion of the counterfactuals, a reduction in age was required, which is not even possible. To further study this effect, we regenerate counterfactual explanations for those samples for which agechange was required, with an additional plausibility constraint ensuring that the age shall not change (results with constraints to ensure nondecreasing age are shown in Appendix B). The results presented in Table 4 show interesting results. First, we observe that the additional plausibility constraint for the age incurs significant increases in the distance of the nearest counterfactual – being, as expected, more pronounced for the and the norms, since the norm only accounts for the number of features that change in the counterfactual but not for how much they change. For the norm, as expected, we find that for the 66 factual samples (i.e., ) for which the unrestricted MACE required agechange, the addition of the noagechange constraint results in counterfactuals at very similar distance. In fact, of the newly generated counterfactuals, only require a change in Occupation, and only require a change in Capital Gains, therefore remaining at the same distance as the original counterfactual. In contrast, for the and the norms we find that the restricted counterfactual incurs a significant increase in the distance (cost) with respect to the unrestricted counterfactual. These results suggest that the predictions of the random forest trained on the Adult data are strongly correlated to the age, which is often legally and socially considered as unfair. This suggests that counterfactuals found with MACE may assist in qualitatively ascertaining if other desiderata, such as fairness, are met [doshi2017towards, weller2017challenges].
Latest Bill  Latest Payment  University Degree  Will default next month?  

Factual  $370  $40  some  yes 
CF #1  $368  $1448  some  no 
CF #2  $0  $1241  some  no 
CF #3  $0  $390  graduate  no 
Diversity constraints.
Finally, we present a situation where MACE can be used to generate counterfactuals under both plausibility and diversity constraints. Consider a loan borrower from the Credit dataset identified with the following features^{9}^{9}9Complete feature list in Appendix C.4: John is a married male between 4059 years of age with “some” university degree. Financially, over the last 6 months, John has been struggling to make payments on his bank loan. Given his circumstances, a logistic regression model trained on the historical dataset has predicted that John will default on his loan next month. To prevent this default, the bank uses MACE ( distance, ) to generate the diverse suggestions in Table 5, via successive runs of Algorithm 1. Each new run augments the constraints formula (already including plausibility constraints on his age, sex, and marital status) with an additional clause enforcing diversity as discussed in Section 4.2. The returned counterfactuals (of which only 3 are shown), present John with diverse courses of action: either reduce spending and make a lumpsum payment on the debt (CF #2) or continue spending the same as before, but make an even larger payment to account for continued expenditures (CF #1). Alternatively, providing documents confirming a graduate degree would put John in a lowrisk (no default) bracket (CF #3). We invite the reader to imagine parallels to the above situation for Adult and COMPAS datasets.
6 Conclusions
In this work, we have presented a novel approach for generating counterfactual explanations in the context of consequential decisions. Building on theory and tools from formal verification, we demonstrated that a large class of predictive models can be compiled to formulae which can be verified by standard SMTsolvers. By conjuncting the model formula with formulae corresponding to distance, plausibility, and diversity constraints, we demonstrated on three realworld datasets and four popular predictive models that the proposed method not only achieves perfect coverage, but also generates counterfactuals at more favorable distances than existing optimizationbased approaches. Furthermore, we showed that the proposed method can not only provide explanations for individuals subject to automated decision making systems, but also inform system administrators regarding the potentially unfair reliance of the model on protected attributes.
There are a number of interesting directions for future work. First, MACE can naturally be extended to support counterfactual explanations for multiclass classification models, as well as regression scenarios. Second, extending the multifaceted notion of plausibility defined in Section 4.2 (actionability, data type/range consistency, which focus on individual features), it would be interesting to account for statistical correlations and unmeasured confounding factors among the features when generating counterfactual explanations (i.e., realizability). Third, we would like also to explore how different notions of diversity may help generating meaningful and useful counterfactuals. Finally, in our experiments we noticed that the running time of MACE directly depends on the efficiency of the SMT solver. As future work we aim to make the proposed method more scalable on large models by investigating recent ideas that have been developed in the context of formal verification of deep neural networks [cav/HuangKWW17, cav/KatzBDJK17, pacmpl/SinghGPV19] and optimization modulo theories [NieuwenhuisO06, SebastianiT12].
References
Appendix A Background on programming language and program verification
Programs
We assume given a set of function symbols with their arity. For simplicity, we consider the case where operators are untyped and have arity 0 (constants), 1 (unary functions), and 2 (binary functions). We let , , and range over constants, unary functions and binary functions respectively. Expressions are built from function symbols and variables. The set of expressions is defined inductively by the following grammar:
We next assume given a set of atomic predicates. For simplicity, we also consider that predicates have arity 1 or 2, and let and range over unary and binary predicates respectively. We define guards using the following grammar:
We next define commands. These include assignments, conditionals, bounded loops and return expressions. The set of commands is defined inductively by the following grammar:
We assume that programs satisfy a wellformedness condition. The condition requires that expressions have no successor instruction, i.e. we do not allow commands of the form or . This is without loss of generally, since commands can always be transformed into functionally equivalent programs which satisfy the wellformedness condition.
Single assignment form
Our first step to construct characteristic formulae is to transform programs in an intermediate form that is closer to logic. Without loss of generality, we consider loopfree commands, since loops can be fully unrolled. The intermediate form is called a variant of the wellknown SSA form [rosen1988global, cytron1991efficiently] from compiler optimization. Concretely, we transform programs into some weak form of single assignment. This form requires that every noninput variable is defined before being used, and assigned at most once during execution for any fixed input. The main difference with SSA form is that we do not use socalled nodes, as we require that variables are assigned at most once for any fixed input. More technically, our transformation can be seen as a composition of SSA transform with a naive deSSA transform where nodes are transformed into assignments in the branches of the conditionals.
Path formulae and characteristic formulae
Our second step is to define the set of path formulae. Informally, a path formula represents a possible execution of the program. Fix a distinguished variable for return values. Then the path formulae of a command is defined inductively by the clauses:
The characteristic formula of a command is then defined as:
One can prove that for every inputs , the formula is valid iff the execution of on inputs returns . Note that, strictly speaking, the formula contains as free variables the distinguished variable , the inputs of the program, and all the program variables, say . However, the latter are fully defined by the characteristic formula so validity of is equivalent to validity of .
Appendix B Experiment Details
In this section we provide further details on the detasets and methods used in or experiments, together with some additional results.
b.1 Model Selection
To demonstrate the flexibility of our approach, we explored four different differentiable and nondifferentiable model classes, i.e., decision tree, random forest, logistic regression and multilayer perceptron (MLP). As the main focus of our work is to generate counterfactuals for a broad range of already trained models, we opted for models’ parametrization that result in good performance on the considered datasets (e.g., default parameters). For instance, for the MLP, we opted for two hidden layers with 10 neurons, since it present better performance in the Adult dataset ( training/test accuracy) than other architectures with and which result in and training/test accuracy, respectively. We leave the exploration of other datasets (larger feature spaces), more complex models (deeper MLPs) and other SMT solvers as future work.
b.2 Datasets
Here we detail the different types of variables present in each dataset. We used the default features for the Adult and COMPAS datasets, and applied the same preprocessing used in [ustun2019actionable] for the Credit dataset. All samples with missing data were dropped. We remark that we have relied on broadly studied datasets in the literature on fairness and interpretability of ML for consequential decision making. For instance, the Credit dataset [34] () has been previously studied by the Actionable Recourse work [29], and the Adult [1] () and COMPAS [18] () have been previously used in the context of fairness in ML [Joseph et al., 2016; Zafar et al., 2017; Agarwal et al. 2018].
Adult ():

Integer: Age, Education Number, Hours Per Week

Real: Capital Gain, Capital Loss

Categorical: Sex, Native Country, Work Class, Marital Status, Occupation, Relationship

Ordinal: Education Level
Credit ():

Integer: Total Overdue Counts, Total Months Overdue, Months With Zero Balance Over Last 6 Months, Months With Low Spending Over Last 6 Months, Months With High Spending Over Last 6 Months

Real: Max Bill Amount Over Last 6 Months, Max Payment Amount Over Last 6 Months, Most Recent Bill Amount, Most Recent Payment Amount

Categorical: Is Male, Is Married, Has History Of Overdue Payments

Ordinal: Age Group, Education Level
COMPAS ():

Integer: 

Real: Priors Count

Categorical: Race, Sex, Charge Degreee

Ordinal: Age Group
b.3 Handling Mixed Data Types
While the proposed approach (MACE) naturally handles mixed data types, other approaches do not. Specifically, the Feature Tweaking method generates counterfactual explanations for Random Forest models trained on nonhot embeddings of the dataset, meaning that the resulting counterfactuals will not have multiple categories of the same variable activated at the same time. However, because this method is only restricted to working with realvalued variables, the resulting counterfactual is must undergo a postprocessing step to ensure integer, categorical, and ordinalbased variables are plausible in the counterfactual. The Actionable Recourse method, on the other hand, explanations for Logistic Regression models trained on onehot embeddings of the dataset, hence requiring additional constraints to ensure that multiple categories of a categorical variable are not simultaneously activated in the counterfactual. While the authors suggest how this can be supported using their method, their opensource implementation converts categorical columns to binary where possible and drops other more complicated categorical columns, postponing to future work. Furthermore, the authors state that the question of mutually exclusive features will be revisited in later releases ^{10}^{10}10https://github.com/ustunb/actionablerecourse/blob/master/examples/ex_01_quickstart.ipynb. Moreover, ordinal variables are not supported using this method. The overcome these shortcomings, the counterfactuals generated by both approaches is postprocessed to ensure correctness of variable types by rounding integerbased variables, and taking the maximally activated category as the counterfactual category.
Appendix C Additional Results
agered.  rel. dist. increase  agered.  rel. dist. increase  agered.  rel. dist. increase  

MACE ()  3.6  0  7.4  61.3  34.2  13.9 
MO  24.6  29.7  34.6  94.6  34.2  66.6 
c.1 Comprehensive Distance Results
Following the presentation of coverage results in Table 2 and relative distance improvement (reduction) in Table 3 of the main body, in Figure 4 we present the complete distribution of counterfactual distances upon termination of Algorithm 1. Importantly, we see that in all setups (approaches models norms datasets), MACE results are at least as good as any other approach (MO, PFT, AR).
c.2 Quality vs Complexity
In the main text and in the previous section, we considered distance comparisons upon termination of Algorithm 1; in this section we explore the effect of the accuracy parameter jointly on quality (distance ) and complexity (runtime ) during execution of Algorithm 1. Importantly, the number of calls made to the solver follows , where is the desired the accuracy term, i.e., orders of magnitude more accuracy only cost linearly more calls. The runtime of each call to the solver is governed by a number of parameters, including the implementation details of the solver^{11}^{11}11This is assumed beyond the scope of the paper; we built MACE atop the opensource PySMT library [pysmt2015] with the Z3 [DBLP:conf/tacas/MouraB08] backend to demonstrate its modelagnostic support of offtheshelf models., the compute hardware^{12}^{12}12All tests were conducted using one X86_64 Xeon(R) CPU @ 2.60GHz, and 8GB memory., among other factors. Clearly, a higher desired accuracy (i.e., ) will result in closer counterfactuals () at the cost of higher runtime (higher ), while leaving the coverage unchanged (remaining at , by design). Figure 5 depicts the average counterfactual distance and average runtime against the number of calls to the solver, confirming the intuition above: not only does MACE always achieve a lower counterfactual distance^{13}^{13}13Reminder: lower distance is more desirable, as it specifies the least change required of the individual’s features. upon termination, in many cases an early termination of MACE generates closer counterfactuals while also being less computationally demanding.
In addition to studying the quality vs complexity tradeoff against number of calls to the solver, in Table 6 we compare final runtimes (in seconds) upontermination of Algorithm 1 for various setups. The results show that MACE takes less than 5 seconds for logistic regression; between 5 and 60 seconds for decision trees and random forests; and between one minute and three hours for the multilayer perceptron (outliers were not excluded in computed mean runtimes). In contrast, competing approaches (MO, PFT, AR) require at most 30 seconds to generate a counterfactual explanation, when possible (note that the coverage for AR and PFT is often significantly below 100%, and only MACE is able to generate counterfactuals for the multilayer perceptron; MO requires access to the training data as it searches through the training set for a counterfactual). We believe that this difference is compensated (at least for the decision tree, the random forest, and the logistic regression classifiers) by the main properties of MACE compared to previous works, i.e.: i) modelagnostic ({non}linear, {non}differentiable, {non}convex); ii) dataagnostic (heterogeneous features); iii) provable closeness guarantees; and iv) 100% coverage, even under plausibility and diversity constraints. Regarding the results on MLPs, we are well aware of prior work that develops efficient SMTbased methods for verifying large deep neural networks (see formal verification of deep neural networks [cav/HuangKWW17, cav/KatzBDJK17, pacmpl/SinghGPV19] and optimization modulo theories [NieuwenhuisO06, SebastianiT12]); indeed we plan to leverage stateoftheart tools to improve the efficiency of our implementation, in particular for MLPbased models. With the current implementation of MACE, our main goal was to explore the use of offtheshelf SMTsolvers already available in Python to generate counterfactuals in a broad range of settings, justifying our lesser emphasis on efficiency.
In practice the choice of epsilon should reflect the desired distance granularity from the operator, the number and range of attributes in the data space, and the decided upon distance norm. For example, using the norm, which tracks the number of attributes changed, the lowest achievable distance granularity is where is the data dimensionality. Therefore, choosing any is sufficient and will result in the optimal counterfactual for this choice of distance metric. As another example, for the continuous norm, too much granularity may result in a lack of trust for the enduser – consider the adult dataset with account balance feature with range ; choosing a fine granularity may result in a counterfactual that suggests that only a few dollars change in the account balance can flip the prediction (e.g., result in the approval of a loan). It is important to point out that this phenomenon is not a fault of the counterfactual generating method (i.e., MACE), but of the robustness of the underlying classifier and its decision boundary. While such an explanation may not be favorable for an enduser, it may assist a system administrator or model designer to assay the robustness and safety of their model prior to deployement.
c.3 Additional Constrained Results
Following the study of counterfactuals that change or reduce age (Section 5), we regenerate counterfactual explanations for those samples for which agereduction was required, with an additional plausibility constraint ensuring that the age shall not decrease. The results presented in Table 7 show interesting results. Once again, we observe that the additional plausibility constraint for the age incurs significant increases in the distance of the nearest counterfactual – being, as expected, more pronounced for the and the norms. For the norm, we find that for the 18 factual samples (i.e., ) for which the unrestricted MACE required agereduction, the addition of the noagereduction constraint results in counterfactuals at the same distance, while suggesting a change in work class () or education level () instead of changing age.
c.4 Details on diverse counterfactuals example
In the main body, we described a scenario where a logistic regression model had predicted that a loan borrower, John, would default on his loan. Here is john’s complete feature list: John is a married male between 4059 years of age with some university degree. Over the last 6 months, Max Bill Amount = 500.0, Max Payment Amount = 60.0, Months With Zero Balance = 0.0, Months With Low Spending = 0.0, Months With High Spending = 1.0. Furthermore, John has a history of overdue payments, his Most Recent Bill Amount = 370.0, and his Most Recent Payment Amount = 40.0