1 Introduction

Predictive models are being increasingly used to support consequential decision making at the individual level in contexts such as pretrial bail and loan approval. As a result, there is increasing social and legal pressure to provide explanations that help the affected individuals not only to understand why a prediction was output, but also how to act to obtain a desired outcome. To this end, several works have proposed optimization-based methods to generate nearest counterfactual explanations. However, these methods are often restricted to a particular subset of models (e.g., decision trees or linear models) and differentiable distance functions. In contrast, we build on standard theory and tools from formal verification and propose a novel algorithm that solves a sequence of satisfiability problems, where both the distance function (objective) and predictive model (constraints) are represented as logic formulae. As shown by our experiments on real-world data, our algorithm is: i) model-agnostic ({non-}linear, {non-}differentiable, {non-}convex); ii) data-type-agnostic (heterogeneous features); iii) distance-agnostic (, and combinations thereof); iv) able to generate plausible and diverse counterfactuals for any sample (i.e., 100% coverage); and v) at provably optimal distances.


Model-Agnostic Counterfactual Explanations for Consequential Decisions


Amir-Hossein Karimi, Gilles Barthe, Borja Balle, Isabel Valera


Model-Agnostic Counterfactual Explanations
for Consequential Decisions


Amir-Hossein Karimi &Gilles Barthe &Borja Balle &Isabel Valera


MPI-IS &MPI-SP/IMDEA Software Institute &- &MPI-IS

1 Introduction

Model (as program)Input




Distance (as program)

SMT Solver

Figure 1: Architecture Overview for Model-Agnostic Counterfactual Explanations (MACE)

Data-driven predictive models are ubiquitously being used to support or even substitute humans in decision making in a wide variety of real-world contexts including, e.g., selection process for hiring, loan approval, or pretrial bail. However, as algorithmic methods are increasingly used to make consequential decisions at the individual-level – i.e., decisions that may have significant consequences for the individuals they decide about – the debate about their lack of transparency and explainability becomes more heated. To make things worse, while the verdict is still out as to what constitutes a good explanation [7, 10, 16, 20, 19, 24, 25], there already exists clearly defined legal requirements for explanations in the context of consequential decision making. For example, the EU General Data Protection Regulation (“GDPR”) grants individuals the right-to-explanation [31, 32], via requiring institutions to provide explanations to individuals that are subject to their (semi-)automated decision making systems.

A growing number of works on interpretable machine learning have recently focused on the definitions of, and mechanisms for providing, good explanations for predictor-based decision making systems. In the context of consequential decision making, it is widely agreed that a good explanation should provide answers to the following two questions [7, 12, 33]: (i) “why the model outputs a certain prediction for a given individual?”; and, (ii) “what features describing the individual would need to change to achieve the desired output?

Here, we focus on answering the second question, or equivalently, on generating counterfactual explanations. Of specific importance is the problem of finding the nearest counterfactual explanation – i.e., identifying the set of features resulting in the desired prediction while remaining at minimum distance from the original set of features describing the individual. Existing approaches tackling this problem suffer from various limitations: they either propose solutions that are tailored to particular models, e.g., decision trees [29]; rely on classical optimization tools, thus being restricted to convex predictive models and distances [26, 30]; or, solve a relaxed version of the original optimization problem using gradient-based approaches, thus being restricted to differentiable models and distance functions [33] and lacking optimality guarantees. Additionally, it is important to consider that in the context of consequential decision-making, the features describing individuals are semantically meaningful and heterogeneous (i.e., mixed continuous & discrete); and can either be acted upon (e.g., bank account balance), or immutable and should be safeguarded from change (e.g., sex, race). A good explanation should account for these semantics (i.e., be plausible111We emphasize that while our formulation for generating counterfactuals seems similar to that of adversarial perturbations (image domain), the goals are different: while our goal is to provide actionable and plausible counterfactuals, the goal of adversarial examples is to be imperceptible to humans and hence plausible in the human-perception space, but not in the data space.) to be useful for the individual, a requirement that most existing approaches fail to address.

Our contributions. In this paper, we propose a model-agnostic approach to generate nearest counterfactual explanations, namely MACE, under any given distance function (or convex combinations thereof); while, at the same time, easily supporting additional plausibility constraints. Moreover, our approach readily encodes natural notions of distance for heterogeneous feature spaces, which are common in consequential decision making systems (e.g., loan approval) and consist of mixed numerical (e.g., age and income) and nominal features (e.g., gender and education level). To this end, in MACE we map the nearest counterfactual problem into a sequence of satisfiability () problems, by expressing both the predictive model and the distance function (as well as the plausibility and diversity constraints) as logic formulae. Each of these satisfiability problems aims to verify if there exists a counterfactual explanation at a distance smaller than a given threshold, and can be solved using standard SMT (satisfiability modulo theories) solvers. Moreover, we rely on a binary search strategy on the distance threshold to find an approximation to the nearest (plausible) counterfactual with an arbitrary degree of accuracy, and a lower bound on distance such that no counterfactual provably exists at a smaller distance. Finally, once nearest counterfactuals are found, diversity constraints may be added to the satisfiability problems to find alternative counterfactuals. The overall architecture of MACE is illustrated in Figure 1.

Our experimental validation on real-world datasets show that MACE not only achieves 100% coverage by design, but also generates explanations that are significantly closer than previous approaches [29, 30]. We also provide qualitative examples showcasing the flexibility of our approach to generate actionable counterfactuals by extending our plausibility constraints to restrict changes to a subset of (non-immutable) features. The Python implementation of our algorithms and the datasets used in our experiments are available at https://github.com/amirhk/mace.

2 First-order predicate logic

In this section, we briefly recall basic concepts of first-order predicate logic, which MACE builds upon. We distinguish between function symbols (for instance, addition and multiplication ) and predicate symbols (for instance, equality or lesser than ). Function symbols are used to build expressions, and predicate symbols are used to build atomic formulae. Examples of valid expressions are , , and . Examples of valid atomic formulae are , or . A (quantifier-free) formula is a Boolean combination of atomic formulae. That is, a formula is built from atomic formulae using conjunction , disjunction , and negation . Formulae have an interpretation over their intended domain. For instance, a formula about real-valued expressions has a natural interpretation as a subset of , where denotes the number of variables that appear in the formula. The interpretation is obtained by mapping every variable into a value, e.g., a real number. For example, belongs in the interpretation of since the mapping assigns true because . We say that a formula is satisfiable if its interpretation as a subset of is non-empty.

The satisfiability problem consists in checking whether or not a formula is satisfiable. Satisfiability problems can be verified automatically using satisfiability modulo theories (SMT) solvers like Z3 [5] or CVC4 [3]. We refer to [17] for an exposition of the basic algorithms used by SMT solvers. For the purpose of the next sections, it suffices to assume a given satisfiability oracle . For our experiments, we use off-the-self SMT solvers to realize the oracle. We use SMT solvers as black-box, but it is interesting to note that our formulae fall in the linear fragment of the theory of reals (i.e.  all formulae that only contain expressions of degree 1 when viewed as multi-variate polynomials over variables), which can be decided efficiently using the Fourier-Motzkin algorithm.

3 Counterfactual spaces for predictive models

This section defines a logical representation of counterfactual explanations for predictive models, which are functions mapping input feature vectors into decisions 222While here we assume binary predictor models, i.e., classifiers, our approach generalizes to regression problems where and more generally any other output domain. Given a predictive model , we can define the set of counterfactual explanations for a (factual) input as . In words, contains all the inputs for which the model returns a prediction different from . We also remark that is the set of preimages of under .

For a broad class of predictive models, it is possible to construct counterfactual formulae capturing membership in . We do so by computing the characteristic formula of the model. For a predictive model , and pair of input and output values and , the characteristic formula verifies that is valid if and only if . Thus, given a factual input with and we define the counterfactual formula as


Intuitively, the formula on the right hand side of (1) says that “ is a counterfactual for if either and , or and ”. It is thus clear from the definition that an input satisfies if and only if . Moreover, (1) shows that, to construct counterfactual formulae , we only require the characteristic formulae of the corresponding predictive models, , and the value of . To obtain such characteristic formulae we assume that predictive models are represented by programs in a core programming language with assignments, conditionals, sequential composition, syntactically bounded loops and return statements. This allows us to use techniques from the program verification literature. Specifically, we use the so-called predicate transformers [6, 13, 9, 8]. The description of the general procedure is provided in Appendix A. For ease of exposition, we illustrate the construction of characteristic formulae through two examples, a decision tree and a multilayer perceptron.

As a first example, consider the decision tree from Figure 1(a) which takes as input and returns a binary output in . Figure 1(b) provides the programming language description of this decision tree. To construct a formula representing the function computed by this tree we first build a clause for each leaf in the tree by taking the conjunction of all the conditions encountered in the path from the root to the leaf. For example, the clause corresponding to the leftmost leaf on the tree in Figure 1(a) is . Once all these clauses are constructed, the characteristic formula corresponding to the full tree is obtained by taking the conjunction of all said clauses, as shown in Figure 1(c).







(a) Graphical representation
        if $x_1$ == 1
          $y$ = 0 if $x_3$ > 0 else $1$
          $y$ = 0 if $x_2$ == 1 else $1$
        return $y$
(b) Program (in Python)
(c) Characteristic formula
Figure 2: Decision tree: model, program and characteristic formula

As a second example we consider a feed-forward neural network with one hidden layer followed by a ReLU activation function, as depicted in Figure 2(a). This model implements a function , where the binary decision is taken by thresholding the value of the last hidden node. The programming language representation of this model is given in Figure 2(b). In this case, the characteristic formula predicates over inputs , output and program variables and for each hidden node representing the values on that node before and after the non-linear ReLU transformation, respectively. The characteristic formula is a conjunction, and each conjunct corresponds to one instruction of the program. For example, for the leftmost hidden node in the first layer of the network in Figure 2(a) the variable is associated with the clause ; and the variable corresponds to the value of after the ReLU, which can be written as the disjunction . For the output node – in this case, – we introduce a pair of clauses representing the thresholding operation, i.e. . Taking the conjunction of the formulas for each node we obtain the characteristic formula in Figure 2(c).

(a) Graphical representation
        $z_1$ = $x_1$ - $x_2$
        $z_2$ = $2 x_1$ - $x_3$
        $\tilde{z}_1$ = $z_1$ if $z_1$ >= 0 else 0
        $\tilde{z}_2$ = $z_2$ if $z_2$ >= 0 else 0
        $z_3$ = -$\tilde{z}_1$ + $\tilde{z}_2$
        $y$ = 1 if $z_3$ >= 0 else 0
        return $y$
(b) Program (in Python)
(c) Characteristic formula
Figure 3: Multilayer perceptron: model, program and characteristic formula

4 Finding the nearest counterfactual

Based on the counterfactual space defined in the previous section, we would like to produce counterfactual explanations for the output of a model on a given input by trying to find a nearest counterfactual, which is defined as:


For the time being, we assume that a notion of distance between instances, , is given. For convenience, and without loss of generality, we also assume that takes values in the interval .

4.1 Main algorithm

Our goal now is to leverage the representation of in terms of a logic formula to solve (2). To this end, we map the optimization problem in (2) into a sequence of satisfiability problems, which can be verified or refuted by standard SMT solvers. We do so by first converting the expression , where , into a logic formula , which is valid if and only if . We assume here that the distance function is expressed by a program in the same language that we used to represent the models in Section 3. In particular, we can leverage the procedure detailed in Appendix A to automatically construct . Then, both the counterfactual formula and the distance formula are combined into the logic formula:

which is satisfiable if and only if there exists a counterfactual such that . To check whether the above formula is satisfiable we use the satisfiability oracle which returns either an instance such that is valid, or “unsatisfiable” if no such exists.

Note that, while the oracle allows us to verify if there exist counterfactual explanations at distance smaller or equal than a given threshold , solving optimization (2) requires finding a nearest counterfactual. To do so, we apply a binary search strategy on the distance threshold that allows us to find approximately nearest counterfactuals with a pre-specified degree of accuracy. This is implemented in Algorithm 1, which for an accuracy parameter makes at most calls to and returns a counterfactual such that , where is some solution of the optimization problem in (2). This mild dependence on the accuracy allows Algorithm 1 to trade-off finding arbitrarily accurate solutions of (2) with the number of calls made to the satisfiability oracle. Note that Algorithm 1 may also account for potential plausibility or diversity constraints (refer to next section for further details).

We remark here our approach to find nearest counterfactuals is agnostic to the details of the model and distance being used; the only requirement is that they must be expressable in a fairly general programming language. As a consequence, we can handle a wide variety of predictive models, including both differentiable – such as, logisitic regression and multilayer perceptron – and non-differentiable predictive models – e.g., decision trees and random forest– as well as a wide variety of distance functions (refer to next section for further details). Moreover, the bound returned by Algorithm 1 provides a certificate that any solution to (2) must satisfy . This is because whenever returns “unsatisfiable” it does so by internally constructing a proof that the formula is not valid.

Input: Factual , counterfactual formula , distance formula , constraints formula , accuracy
Output: Counterfactual , distance , lower bound on (2)
Let and while  do
       Let Let Let if  is “unsatisfiable” then
             Let and
return , ,
Algorithm 1 Binary Search for Nearest Counterfactuals with Satisfiability Oracle

4.2 Distance, Plausibility, and Diversity

Next we discuss additional criteria in the form of logic clauses that guide the satisfiability problem towards generating a counterfactual explanation with desired properties.

Distance. We first discuss several forms for the distance function that can be used to define the notion of nearest counterfactual. To this end, we first remark that in consequential decision making the input feature space is often heterogeneous – for example, gender is categorical, education level is ordinal, and income is a numerical variable. We define an appropriate distance metric for every kind of variable in the input feature space of the model as:

where corresponds to the range of the feature and is used to normalize the distances for all input features, such that for all , independently on the feature type. By defining the distance vector (being the total number of input features), one can now write the distance between instances as:


where is the -norm of a vector, and such that333Constraints on the distance hyperparameters ensure that the overall distance . To this end, since , the hyperparameters must satisfy . . Intuitively, -norm is used to restrict the number of features that changes between the initial instance and the generated counterfactual ; the -norm is used to restrict the average change distance between and ; and -norm is used to restrict maximum change across features. Any distance of this type can easily be expressed as a program.

Approach Models Data types Distances Plausibility Optimal Distance
Proposed (MACE) tree, forest, lr, mlp heterogeneous
Minimum Observable (MO) - heterogeneous x
Feature Tweaking (FT) tree, forest heterogeneous x x
Actionable Recourse (AR) lr numeric, binary  x x
Table 1: Comparison of approaches for generating counterfactual explanations, based on the supported model types, data types, distance types, plausibility constraints (actionability, data type/range consistency), and optimal distance guarantees.

Plausibility. Up to this point, we have only considered minimum distance as the only requirement for generating a counterfactual. However, this might result in unrealistic counterfactuals, such as e.g., decrease the age or change the gender of a loan applicant. To avoid unrealistic counterfactuals, one may introduce additional plausibility constraints in the optimization problem in Eq. (2). This is equivalent to adding a conjunction in the constraint formula in Algorithm 1 that accounts for any additional plausibility formulae , which ensure that: i) each feature in the counterfactual should be data-type and data-range consistent with the training data; and ii) only actionable features [30] are changed in the resulting counterfactual.

First, since here we are working with heterogeneous feature spaces, we require all the features in the counterfactual to be consistent in both the data-types (categorical, ordinal, etc.) and the data-ranges with the training data. In particular, if a categorical (ordinal) feature is one-hot (thermometer) encoded to be used as input to the predictive model, e.g., a logistic regression classifier, we make sure that the generated counterfactual provides a valid one-hot vector (thermometer) for such feature. Likewise, for any numerical feature we ensure that its value in the counterfactual falls into observed range in the original data used to train the predictive model.

Moreover, to account for a non-actionable/immutable feature , i.e., a feature whose value in the counterfactual explanation should match its initial value, we set to be . Similarly, we account for variables that only allow for increasing values by setting .

Diversity. Finally, one might be interested in generating a (small) set of diverse counterfactual explanations for the same instance . To this end, we iteratively call Algorithm 1 with a constraints formula that includes diversity clauses to ensure that the newly generated explanation is substantially different from all the previous ones. We can encode diversity by forcing that the distance between every pair of counterfactual explanations is greater than a given value. For example, we can take444 is the -th dimensions of the -th counterfactual. to restrict repetitive counterfactuals by enforcing subsequent counterfactuals to have 0-norm distance at least from all previous counterfactuals.

5 Experiments

Adult Credit COMPAS
tree PFT 0 0 0 68 68 68 74 74 74
forest PFT 0 0 0 99 99 99 100 100 100
lr AR 18 0.4 100 100 100 100
Table 2: Coverage computed on factual samples. For comparison, always, by definition and by design, respectively. Cells are shaded when tests are not supported. Higher % is better.
Adult Credit COMPAS
tree MACE () vs MO 47 80 70 67 66 47 1 5 5
MACE () vs MO 47 81 72 67 96 94 1 5 5
MACE () vs PFT 53 87 85 14 56 54
MACE () vs PFT 53 97 96 15 55 54
forest MACE () vs MO 51 81 69 68 61 38 1 6 6
MACE () vs MO 51 82 71 68 97 96 1 6 6
MACE () vs PFT 53 84 81 4 28 27
MACE () vs PFT 53 96 96 4 28 27
lr MACE () vs MO 62 92 86 80 82 80 3 8 6
MACE () vs MO 62 93 88 80 82 81 3 6 6
MACE () vs AR 3 89 39 67 10 38
MACE () vs AR 5 91 42 71 10 38
mlp MACE () vs MO 60 92 91 77 85 91 1 3 3
MACE () vs MO 60 93 93 77 96 96 1 3 3
Table 3: Percentage of improvement in distances, computed as . factual samples. Cells are shaded when tests are not supported. The higher the %, the better the improvement.
age-change rel. dist. increase age-change rel. dist. increase age-change rel. dist. increase
MACE () 13.2 9.0 20.4 100.3 84.4 32.8
MO 78.8 50.9 92.0 245.7 95.6 193.3
Table 4: Percentage of factual samples for which the nearest counterfactual sample requires a change in age for a random forest trained on the Adult dataset, and the corresponding increase in distance to nearest counterfactual when restricting the approaches not to change age: . Lower % is better.

In this section, we empirically demonstrate the main properties of MACE compared to existing approaches.

Datasets. We evaluate MACE at generating counterfactual explanations on three real-world datasets in the context of loan approval (Adult [1] and Credit [36] datasets) and pretrial bail (COMPAS dataset [18]). All the three datasets present heterogeneous input spaces.

Baselines. We compare the performance of MACE at generating the nearest counterfactual explanations with: the Minimum Observable (MO) approach [35], which searches in the dataset for the closest sample that flips the prediction; the Feature Tweaking (FT) approach [29], which searches for the nearest counterfactual lying close to the decision boundary of a Random Forest; and the Actionable Recourse (AR) [30], which solves a mixed integer linear program to obtain counterfactual explanations for Linear Regression models. Table 1 summarizes the main properties of all the considered approaches to generate counterfactuals.

Metrics. To assess and compare the performance of the different approaches, we recall the criteria of good explanations for consequential decisions: i) the returned counterfactual should be as near as possible to the factual sample corresponding to the individual’s features; ii) the returned counterfactual must be plausible (refer to Section 4.2). Hence, we quantitatively compare the performance of MACE with the above approaches in terms of i) the normalized distance ; and ii) coverage indicating the percentage of factual samples for which the approach generates plausible (in type and range) counterfactuals.

Experimental set-up. We consider as predictive models decision trees, random forest, logistic regression, and multilayer perceptron, which we train on the three datasets using the Python library scikit-learn [22], with default parameters.555For the multilayer perceptron, we used two hidden layers with 10 neurons each to avoid overfitting. See Appendix B.1 for model selection details. Furthermore, to demonstrate the off-the-shelf flexibility in the various setups described, we build MACE atop the open-source PySMT library [11] with the Z3 [5] backend. In Appendix C.2, we provide a thorough empirical evaluation of the computational cost of the off-the-shelf PySMT solver – including run-time comparisons between MACE and other baselines, – as well as a discussion on the choice of trading-off arbitrarily accurate solutions of (2) with the number of calls made to the satisfiability oracle.

For each combination of approach, model, dataset, and distance, we generate the nearest counterfactual explanations for a held-out set of instances classified as negative by the corresponding model. Here we consider the , , norms as a measure of distance to identify the nearest counterfactuals. Unfortunately, we found that FT not once returned a plausible counterfactual. As a consequence, we modified the original implementation of FT, to ensure that the generated counterfactuals are plausible. The resulting Plausible Feature Tweaking (PFT) projects the set of candidate counterfactuals into a plausible domain before selecting the nearest counterfactual amongst them. This was not possible for AR because the approach only returns a single counterfactual, with no avail if it is not plausible.666 Importantly, Actionable Recourse does support actionability and data-range plausibility, however, it lacks support for data-type plausibility – Appendix B.3 describes the failure points of AR, as reported by the authors.

Coverage and distance results. Table 2 shows the coverage of all the approaches based only on data-range and data-type plausibility. Note that, since by definition both MACE and MO have coverage, we have not depicted these values in the table. In contrast, PFT fails to return counterfactuals for roughly of the Credit and COMPAS datasets, while both PFT and AR achieve minimal coverage on the Adult dataset.777The Adult dataset comprises a realistic mix of integer, real-valued, categorical, and ordinal variables common to consequential scenarios; further details in Appendix B.2. Focusing on those factual samples for which PFT and AR return plausible counterfactuals, we are able to compute the relative distance reductions achieved when using MACE as compared to other approaches, as shown in Table 3 (additionally, Figure 4 in Appendix B shows the distribution of the distance of the generated plausible counterfactual for all models, datasets, distances, and approaches). Here, we observe that MACE results in significantly closer counterfactual explanations than competing approaches, with an average decrease in distance of for Adult, for Credit, and for COMPAS. As a consequence, the counterfactuals generated by MACE would require significantly less effort on behalf of the affected individual in order to achieve the desired prediction.

Plausibility contraints.

While performing a qualitative analysis of generated counterfactuals we observed that many of them require changes in features that are often protected by law such as, age, race, and gender [2]. As an example, for a trained random forest, the counterfactuals generated by both the MACE and MO approaches required individuals to change their age. Worse yet, for a substantial portion of the counterfactuals, a reduction in age was required, which is not even possible. To further study this effect, we regenerate counterfactual explanations for those samples for which age-change was required, with an additional plausibility constraint ensuring that the age shall not change (results with constraints to ensure non-decreasing age are shown in Appendix C.3). The results presented in Table 4 show interesting results. First, we observe that the additional plausibility constraint for the age incurs significant increases in the distance of the nearest counterfactual – being, as expected, more pronounced for the and the norms, since the norm only accounts for the number of features that change in the counterfactual but not for how much they change. For the norm, as expected, we find that for the 66 factual samples (i.e., ) for which the unrestricted MACE required age-change, the addition of the no-age-change constraint results in counterfactuals at very similar distance. In fact, of the newly generated counterfactuals, only require a change in Occupation, and only require a change in Capital Gains, therefore remaining at the same distance as the original counterfactual. In contrast, for the and the norms we find that the restricted counterfactual incurs a significant increase in the distance (cost) with respect to the unrestricted counterfactual. These results suggest that the predictions of the random forest trained on the Adult data are strongly correlated to the age, which is often legally and socially considered as unfair. This suggests that counterfactuals found with MACE may assist in qualitatively ascertaining if other desiderata, such as fairness, are met [7, 34].

Latest Bill Latest Payment University Degree Will default next month?
Factual $370 $40 some yes
CF #1 $368 $1448 some no
CF #2 $0 $1241 some no
CF #3 $0 $390 graduate no
Table 5: A diverse set of generated counterfactuals is presented for an individual from the Credit dataset.

Diversity constraints.

Finally, we present a situation where MACE can be used to generate counterfactuals under both plausibility and diversity constraints. Consider a loan borrower from the Credit dataset identified with the following features888Complete feature list in Appendix C.4: John is a married male between 40-59 years of age with “some” university degree. Financially, over the last 6 months, John has been struggling to make payments on his bank loan. Given his circumstances, a logistic regression model trained on the historical dataset has predicted that John will default on his loan next month. To prevent this default, the bank uses MACE ( distance, ) to generate the diverse suggestions in Table 5, via successive runs of Algorithm 1. Each new run augments the constraints formula (already including plausibility constraints on his age, sex, and marital status) with an additional clause enforcing diversity as discussed in Section 4.2. The returned counterfactuals (of which only 3 are shown), present John with diverse courses of action: either reduce spending and make a lump-sum payment on the debt (CF #2) or continue spending the same as before, but make an even larger payment to account for continued expenditures (CF #1). Alternatively, providing documents confirming a graduate degree would put John in a low-risk (no default) bracket (CF #3). We invite the reader to imagine parallels to the above situation for Adult and COMPAS datasets.

6 Conclusions

In this work, we have presented a novel approach for generating counterfactual explanations in the context of consequential decisions. Building on theory and tools from formal verification, we demonstrated that a large class of predictive models can be compiled to formulae which can be verified by standard SMT-solvers. By conjuncting the model formula with formulae corresponding to distance, plausibility, and diversity constraints, we demonstrated on three real-world datasets and four popular predictive models that the proposed method not only achieves perfect coverage, but also generates counterfactuals at more favorable distances than existing optimization-based approaches. Furthermore, we showed that the proposed method can not only provide explanations for individuals subject to automated decision making systems, but also inform system administrators regarding the potentially unfair reliance of the model on protected attributes.

There are a number of interesting directions for future work. First, MACE can naturally be extended to support counterfactual explanations for multi-class classification models, as well as regression scenarios. Second, extending the multi-faceted notion of plausibility defined in Section 4.2 (actionability, data type/range consistency, which focus on individual features), it would be interesting to account for statistical correlations and unmeasured confounding factors among the features when generating counterfactual explanations (i.e., realizability). Third, we would like also to explore how different notions of diversity may help generating meaningful and useful counterfactuals. Finally, in our experiments we noticed that the running time of MACE directly depends on the efficiency of the SMT solver. As future work we aim to make the proposed method more scalable on large models by investigating recent ideas that have been developed in the context of formal verification of deep neural networks [14, 15, 28] and optimization modulo theories [21, 27].


  • [1] Adult data (1996) Note: \hrefhttps://archive.ics.uci.edu/ml/datasets/adulthttps://archive.ics.uci.edu/ml/datasets/adult Cited by: §5.
  • [2] S. Barocas and A. D. Selbst (2016-01) Big data’s disparate impact. SSRN Electronic Journal, pp. . Cited by: §5.
  • [3] C. Barrett, C. L. Conway, M. Deters, L. Hadarean, D. Jovanovic, T. King, A. Reynolds and C. Tinelli (2011-07) CVC4. In Proceedings of the 23rd International Conference on Computer Aided Verification (CAV ’11), G. Gopalakrishnan and S. Qadeer (Eds.), Vol. 6806, pp. 171–177. External Links: Link Cited by: §2.
  • [4] R. Cytron, J. Ferrante, B. K. Rosen, M. N. Wegman and F. K. Zadeck (1991) Efficiently computing static single assignment form and the control dependence graph. ACM Transactions on Programming Languages and Systems (TOPLAS) 13 (4), pp. 451–490. Cited by: Appendix A.
  • [5] L. M. de Moura and N. Bjørner (2008) Z3: an efficient SMT solver. In Tools and Algorithms for the Construction and Analysis of Systems, 14th International Conference, TACAS 2008, C. R. Ramakrishnan and J. Rehof (Eds.), Vol. 4963, pp. 337–340. External Links: Link, Document Cited by: §2, §5, footnote 10.
  • [6] E. W. Dijkstra (1968-09-01) A constructive approach to the problem of program correctness. BIT Numerical Mathematics 8 (3), pp. 174–186. External Links: ISSN 1572-9125, Document, Link Cited by: §3.
  • [7] F. Doshi-Velez and B. Kim (2017) Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608. Cited by: §1, §1, §5.
  • [8] C. Flanagan and J. B. Saxe (2001) Avoiding exponential explosion: generating compact verification conditions. In ACM SIGPLAN Notices, Vol. 36, pp. 193–205. Cited by: §3.
  • [9] R. W. Floyd (1993) Assigning meanings to programs. In Program Verification: Fundamental Issues in Computer Science, T. R. Colburn, J. H. Fetzer and T. L. Rankin (Eds.), pp. 65–81. External Links: ISBN 978-94-011-1793-7, Document, Link Cited by: §3.
  • [10] A. A. Freitas (2014) Comprehensible classification models: a position paper. ACM SIGKDD explorations newsletter 15 (1), pp. 1–10. Cited by: §1.
  • [11] M. Gario and A. Micheli (2015) PySMT: a solver-agnostic library for fast prototyping of smt-based algorithms. In SMT Workshop 2015, Cited by: §5, footnote 10.
  • [12] D. Gunning (2019) DARPA’s explainable artificial intelligence (XAI) program. In Proceedings of the 24th International Conference on Intelligent User Interfaces, pp. ii–ii. Cited by: §1.
  • [13] C. A. R. Hoare (1969) An axiomatic basis for computer programming. Communications of the ACM 12 (10), pp. 576–580. Cited by: §3.
  • [14] X. Huang, M. Kwiatkowska, S. Wang and M. Wu (2017) Safety verification of deep neural networks. In Computer Aided Verification - 29th International Conference, CAV, R. Majumdar and V. Kuncak (Eds.), Vol. 10426, pp. 3–29. External Links: Link, Document Cited by: §C.2, §6.
  • [15] G. Katz, C. W. Barrett, D. L. Dill, K. Julian and M. J. Kochenderfer (2017) Reluplex: an efficient SMT solver for verifying deep neural networks. In Computer Aided Verification - 29th International Conference, CAV, R. Majumdar and V. Kuncak (Eds.), Vol. 10426, pp. 97–117. External Links: Link, Document Cited by: §C.2, §6.
  • [16] Y. Kodratoff (1994) The comprehensibility manifesto. KDD Nugget Newsletter 94 (9). Cited by: §1.
  • [17] D. Kroening and O. Strichman (2008) Decision procedures: an algorithmic point of view. 1 edition, Springer Publishing Company, Incorporated. External Links: ISBN 3540741046, 9783540741046 Cited by: §2.
  • [18] J. Larson, S. Mattu, L. Kirchner and J. Angwin (2016) Note: \hrefhttps://github.com/propublica/compas-analysishttps://github.com/propublica/compas-analysis Cited by: §5.
  • [19] Z. C. Lipton (2018-06) The mythos of model interpretability. Queue 16 (3), pp. 30:31–30:57. External Links: ISSN 1542-7730, Link, Document Cited by: §1.
  • [20] W. J. Murdoch, C. Singh, K. Kumbier, R. Abbasi-Asl and B. Yu (2019) Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences 116 (44), pp. 22071–22080. Cited by: §1.
  • [21] R. Nieuwenhuis and A. Oliveras (2006) On SAT modulo theories and optimization problems. In Theory and Applications of Satisfiability Testing - SAT, A. Biere and C. P. Gomes (Eds.), Vol. 4121, pp. 156–169. External Links: Link, Document Cited by: §C.2, §6.
  • [22] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss and V. Dubourg (2011) Scikit-learn: machine learning in python. Journal of machine learning research 12 (Oct), pp. 2825–2830. Cited by: §5.
  • [23] B. K. Rosen, M. N. Wegman and F. K. Zadeck (1988) Global value numbers and redundant computations. In Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pp. 12–27. Cited by: Appendix A.
  • [24] C. Rudin (2018) Please stop explaining black box models for high stakes decisions. arXiv preprint arXiv:1811.10154. Cited by: §1.
  • [25] S. Rüping (2006) Learning interpretable models. PhD dissertation, Technical University of Dortmund. Cited by: §1.
  • [26] C. Russell (2019) Efficient search for diverse coherent explanations. In Proceedings of the Conference on Fairness, Accountability, and Transparency, FAT* ’19, pp. 20–28. External Links: ISBN 978-1-4503-6125-5, Link, Document Cited by: §1.
  • [27] R. Sebastiani and S. Tomasi (2012) Optimization in SMT with cost functions. In Automated Reasoning - 6th International Joint Conference, IJCAR, B. Gramlich, D. Miller and U. Sattler (Eds.), Vol. 7364, pp. 484–498. External Links: Link, Document Cited by: §C.2, §6.
  • [28] G. Singh, T. Gehr, M. Püschel and M. T. Vechev (2019) An abstract domain for certifying neural networks. PACMPL 3 (POPL), pp. 41:1–41:30. External Links: Link Cited by: §C.2, §6.
  • [29] G. Tolomei, F. Silvestri, A. Haines and M. Lalmas (2017) Interpretable predictions of tree-based ensembles via actionable feature tweaking. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 465–474. Cited by: §1, §1, §5.
  • [30] B. Ustun, A. Spangher and Y. Liu (2019) Actionable recourse in linear classification. In Proceedings of the Conference on Fairness, Accountability, and Transparency, pp. 10–19. Cited by: §B.2, §1, §1, §4.2, §5.
  • [31] P. Voigt and A. Von dem Bussche The EU general data protection regulation (GDPR). Cited by: §1.
  • [32] S. Wachter, B. Mittelstadt and L. Floridi (2017) Why a right to explanation of automated decision-making does not exist in the general data protection regulation. International Data Privacy Law 7 (2), pp. 76–99. Cited by: §1.
  • [33] S. Wachter, B. Mittelstadt and C. Russell (2017) Counterfactual explanations without opening the black box: automated decisions and the GDPR. Harvard Journal of Law & Technology 31 (2). Cited by: §1, §1.
  • [34] A. Weller (2017) Challenges for transparency. In Workshop on Human Interpretability in Machine Learning (ICML), Cited by: §5.
  • [35] J. Wexler, M. Pushkarna, T. Bolukbasi, M. Wattenberg, F. Viégas and J. Wilson (2019) The what-if tool: interactive probing of machine learning models. IEEE transactions on visualization and computer graphics 26 (1), pp. 56–65. Cited by: §5.
  • [36] I. Yeh and C. Lien (2009) The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Systems with Applications 36 (2), pp. 2473–2480. Cited by: §5.

Appendix A Background on programming language and program verification


We assume given a set of function symbols with their arity. For simplicity, we consider the case where operators are untyped and have arity 0 (constants), 1 (unary functions), and 2 (binary functions). We let , , and range over constants, unary functions and binary functions respectively. Expressions are built from function symbols and variables. The set of expressions is defined inductively by the following grammar:

We next assume given a set of atomic predicates. For simplicity, we also consider that predicates have arity 1 or 2, and let and range over unary and binary predicates respectively. We define guards using the following grammar:

We next define commands. These include assignments, conditionals, bounded loops and return expressions. The set of commands is defined inductively by the following grammar:

We assume that programs satisfy a well-formedness condition. The condition requires that expressions have no successor instruction, i.e.  we do not allow commands of the form or . This is without loss of generally, since commands can always be transformed into functionally equivalent programs which satisfy the well-formedness condition.

Single assignment form

Our first step to construct characteristic formulae is to transform programs in an intermediate form that is closer to logic. Without loss of generality, we consider loop-free commands, since loops can be fully unrolled. The intermediate form is called a variant of the well-known SSA form [23, 4] from compiler optimization. Concretely, we transform programs into some weak form of single assignment. This form requires that every non-input variable is defined before being used, and assigned at most once during execution for any fixed input. The main difference with SSA form is that we do not use so-called -nodes, as we require that variables are assigned at most once for any fixed input. More technically, our transformation can be seen as a composition of SSA transform with a naive de-SSA transform where -nodes are transformed into assignments in the branches of the conditionals.

Path formulae and characteristic formulae

Our second step is to define the set of path formulae. Informally, a path formula represents a possible execution of the program. Fix a distinguished variable for return values. Then the path formulae of a command is defined inductively by the clauses:

The characteristic formula of a command is then defined as:

One can prove that for every inputs , the formula is valid iff the execution of on inputs returns . Note that, strictly speaking, the formula contains as free variables the distinguished variable , the inputs of the program, and all the program variables, say . However, the latter are fully defined by the characteristic formula so validity of is equivalent to validity of .

Appendix B Experiment Details

In this section we provide further details on the detasets and methods used in or experiments, together with some additional results.

b.1 Model Selection

To demonstrate the flexibility of our approach, we explored four different differentiable and non-differentiable model classes, i.e., decision tree, random forest, logistic regression and multilayer perceptron (MLP). As the main focus of our work is to generate counterfactuals for a broad range of already trained models, we opted for models’ parametrization that result in good performance on the considered datasets (e.g., default parameters). For instance, for the MLP, we opted for two hidden layers with 10 neurons, since it present better performance in the Adult dataset ( training/test accuracy) than other architectures with and which result in and training/test accuracy, respectively. We leave the exploration of other datasets (larger feature spaces), more complex models (deeper MLPs) and other SMT solvers as future work.

b.2 Datasets

Here we detail the different types of variables present in each dataset. We used the default features for the Adult and COMPAS datasets, and applied the same preprocessing used in [30] for the Credit dataset. All samples with missing data were dropped. We remark that we have relied on broadly studied datasets in the literature on fairness and interpretability of ML for consequential decision making. For instance, the Credit dataset [34] () has been previously studied by the Actionable Recourse work [29], and the Adult [1] () and COMPAS [18] () have been previously used in the context of fairness in ML [Joseph et al., 2016; Zafar et al., 2017; Agarwal et al. 2018].

Adult ():

  • Integer: Age, Education Number, Hours Per Week

  • Real: Capital Gain, Capital Loss

  • Categorical: Sex, Native Country, Work Class, Marital Status, Occupation, Relationship

  • Ordinal: Education Level

Credit ():

  • Integer: Total Overdue Counts, Total Months Overdue, Months With Zero Balance Over Last 6 Months, Months With Low Spending Over Last 6 Months, Months With High Spending Over Last 6 Months

  • Real: Max Bill Amount Over Last 6 Months, Max Payment Amount Over Last 6 Months, Most Recent Bill Amount, Most Recent Payment Amount

  • Categorical: Is Male, Is Married, Has History Of Overdue Payments

  • Ordinal: Age Group, Education Level


  • Integer: -

  • Real: Priors Count

  • Categorical: Race, Sex, Charge Degreee

  • Ordinal: Age Group

b.3 Handling Mixed Data Types

While the proposed approach (MACE) naturally handles mixed data types, other approaches do not. Specifically, the Feature Tweaking method generates counterfactual explanations for Random Forest models trained on non-hot embeddings of the dataset, meaning that the resulting counterfactuals will not have multiple categories of the same variable activated at the same time. However, because this method is only restricted to working with real-valued variables, the resulting counterfactual is must undergo a post-processing step to ensure integer-, categorical-, and ordinal-based variables are plausible in the counterfactual. The Actionable Recourse method, on the other hand, explanations for Logistic Regression models trained on one-hot embeddings of the dataset, hence requiring additional constraints to ensure that multiple categories of a categorical variable are not simultaneously activated in the counterfactual. While the authors suggest how this can be supported using their method, their open-source implementation converts categorical columns to binary where possible and drops other more complicated categorical columns, postponing to future work. Furthermore, the authors state that the question of mutually exclusive features will be revisited in later releases 999https://github.com/ustunb/actionable-recourse/blob/master/examples/ex_01_quickstart.ipynb. Moreover, ordinal variables are not supported using this method. The overcome these shortcomings, the counterfactuals generated by both approaches is post-processed to ensure correctness of variable types by rounding integer-based variables, and taking the maximally activated category as the counterfactual category.

Appendix C Additional Results


-.4in-.4in Adult Credit COMPAS tree MACE () 5.65  2.18 3.01  0.74 3.47  0.93 3.48  1.25 3.44  1.70 2.39  0.64 2.41  1.06 1.22  0.36 1.62  0.78 MACE () 17.59  4.87 9.58  3.05 10.43  2.98 15.84  4.78 7.55  3.44 4.44  2.20 7.07  2.09 5.72  1.28 4.99  1.89 MACE () 35.32  14.07 20.35  6.34 20.44  9.55 25.47  8.71 18.46  6.24 10.58  6.36 13.49  6.44 9.22  4.21 10.76  4.60 MO 1.04  0.26 0.85  0.27 0.87  0.22 0.53  0.15 0.64  0.26 0.54  0.23 0.15  0.07 0.12  0.06 0.16  0.07 PFT 1.45  0.42 1.50  0.36 1.91  0.79 0.12  0.05 0.13  0.06 0.12  0.05 forest MACE () 27.98  9.48 17.68  4.82 19.05  6.11 28.12  9.31 21.88  10.04 21.47  11.07 8.07  3.36 3.18  1.15 3.52  1.93 MACE () 69.19  15.76 55.79  15.78 52.31  15.39 57.29  26.69 40.75  17.85 26.21  11.71 15.05  5.15 10.75  3.03 8.53  3.55 MACE () 89.81  28.99 84.89  35.14 78.49  23.85 107.83  52.32 90.04  38.02 72.38  37.77 33.26  9.79 19.95  10.03 17.22  7.90 MO 1.14  0.35 0.98  0.25 0.94  0.36 0.80  0.27 0.80  0.35 0.80  0.28 0.16  0.06 0.17  0.08 0.15  0.07 PFT 13.41  7.09 10.46  4.67 11.79  6.51 1.93  0.81 2.11  1.07 1.83  0.87 lr MACE () 0.85  0.29 0.66  0.26 0.74  0.29 0.33  0.15 1.17  1.79 0.49  0.30 0.21  0.10 0.19  0.10 0.22  0.11 MACE () 2.22  0.86 3.55  1.50 5.15  3.51 0.87  0.20 10.57  8.14 6.11  3.51 0.52  0.18 0.31  0.12 0.54  0.20 MACE () 2.73  0.73 6.60  3.01 13.32  6.70 1.19  0.56 25.10  21.67 16.21  8.84 0.84  0.22 0.72  0.28 0.77  0.21 MO 7.52  1.91 6.62  1.73 5.73  1.14 1.86  0.82 1.41  0.53 1.69  0.79 0.30  0.22 0.25  0.12 0.25  0.11 AR 2.05  0.45 1.86  0.03 0.72  0.15 0.66  0.07 0.07  0.01 0.06  0.01 mlp MACE () 2586  4523 8070  5995 5091  6616 1743  4171 3432  5615 10309  10088 59  53 158  135