Measurable Counterfactual Local Explanations for Any Classifier
Abstract
We propose a novel method for explaining the predictions of any classifier. In our approach, local explanations are expected to explain both the outcome of a prediction and how that prediction would change if ’things had been different’. Furthermore, we argue that satisfactory explanations cannot be dissociated from a notion and measure of fidelity, as advocated in the early days of neural networks’ knowledge extraction. We introduce a definition of fidelity to the underlying classifier for local explanation models which is based on distances to a target decision boundary. A system called CLEAR: Counterfactual Local Explanations via Regression, is introduced and evaluated. CLEAR generates wcounterfactual explanations that state minimum changes necessary to flip a prediction’s classification. CLEAR then builds local regression models, using the wcounterfactuals to measure and improve the fidelity of its regressions. By contrast, the popular LIME method LIME (), which also uses regression to generate local explanations, neither measures its own fidelity nor generates counterfactuals. CLEAR’s regressions are found to have significantly higher fidelity than LIME’s, averaging over 45% higher in this paper’s four case studies.
1 Introduction
Machine learning systems are increasingly being used for automated decision making. It is important that these systems’ decisions can be trusted. This is particularly the case in mission critical situations such as medical diagnosis, airport security or highvalue financial trading. Yet the inner workings of many machine learning systems seem unavoidably opaque. The number and complexity of their calculations are often simply beyond the capacities of humans to understand. One possible solution is to treat machine learning systems as ‘blackboxes’ and to then explain their inputoutput behaviour. Such approaches can be divided into two broad types: those providing global explanations of the entire system and those providing local explanations of single predictions. Local explanations are needed when a machine learning system’s decision boundary is too complex to allow for global explanations. This paper focuses on local explanations.
A novel method called Counterfactual Local Explanations viA Regression (CLEAR) is proposed. This is based on a concept of counterfactual explanation from the philosophy of science’s analysis of causality Woodward (); Pearl (). Perhaps the most influential account is Woodward’s Woodward (). Woodward states that a satisfactory explanation consists in showing patterns of counterfactual dependence. By this he means that it should answer a set of ‘whatifthingshad beendifferent?’ questions, which specify how the explanandum (i.e. the phenomenon to be explained) would change if, contrary to the fact, input conditions had been different. It is in this way that a user can understand the relevance of different features, and understand the different ways in which they could change the value of the explanandum. Central to Woodward’s notion is the requirement for an explanatory generalization: {quoting}[vskip=2pt] “Suppose that M is an explanandum consisting in the statement that some variable Y takes the particular value y. Then an explanans E for M will consist of (a) a generalization G relating changes in the value(s) of a variable X (where X may itself be a vector or ntuple of variables ) with changes in Y, and (b) a statement (of initial or boundary conditions) that the variable X takes the particular value x.” In Woodward’s analysis, X causes Y. For our purposes, Y can be taken as the machine learning system’s predictions where X are the system’s input features. The required generalization can be a regression equation that captures the machine learning system’s local inputoutput behaviour.
CLEAR provides counterfactual explanations by building on the strengths of two stateoftheart explanatory methods, while at the same time addressing their weaknesses. The first is by Wachter et al. Counterfactuals (); Explaining () who argue that single predictions are explained by what we shall term as ’wcounterfactuals’ that state the minimum changes needed for an observation to ’flip’ its classification. The second method is by Riberio et al. LIME () who argue for Local Interpretable ModelAgnostic Explanations (LIME). These explanations are created by building a regression model that seeks to approximate the local inputoutput behaviour of the machine learning system.
In isolation,’wcounterfactuals’ do not provide explanatory generalizations relating X to Y and therefore are not satisfactory explanations, as we exemplify in the next section. LIME, on the other hand, does not measure the fidelity of its regressions and cannot produce counterfactual explanations.
The contribution of this paper is threefold. We introduce a novel explanation method capable of:

providing counterfactuals that are explained by regression coefficients including interaction terms;

evaluating the fidelity of its local explanations to the underlying learning system;

using the values of wcounterfactual to significantly improve the fidelity of its regressions.
When applied to multilayer perceptrons (MLPs) trained on four datasets, CLEAR improves on the fidelity of LIME by an average of over 45%.
Section 2 provides the background to CLEAR including an analysis of wcounterfactuals and LIME. Section 3 introduces CLEAR and explains how it uses wcounterfactuals to both measure and improve the fidelity of its regressions. Section 4 contains experimental results on four datasets showing that CLEAR’s regressions have significantly higher fidelity than LIME’s. Section 5 concludes the paper and discusses directions for future work.
2 Background
This paper adopts the following notation: let m be a machine learning system mapping ; m is said to generate prediction y for observation x.
2.1 WCounterfactual Explanations
Wachter et al.’s wcounterfactuals explain a single prediction by identifying ‘close possible worlds’ in which an individual would receive the prediction they desired. For example, if a banking machine learning system declined Mr Jones’ loan application, a wcounterfactual explanation might be that ‘Mr Jones would have received his loan, if his annual salary had been $35,000 instead of the $32,000 he currently earns’. The $3000 increase would be just sufficient to flip Mr Jones to the desired side of the banking system’s decision boundary.
Wachter et al. note that a counterfactual explanation may involve changes to multiple features. Hence, an additional wcounterfactual explanation for Mr Jones might be that he would also get the loan if his annual salary was $33,000 and he had been employed for more than 5 years. Wachter et al. state that counterfactual explanations have the following form: {quoting}[vskip=2pt] “Score p was returned because variables V had values ( ) associated with them. If V instead had values (), and all other variables remained constant, score p’ would have been returned” Wachter et al. specify an objective function and optimiser that samples m and searches for wcounterfactuals.
The key problem with wcounterfactuals: wcounterfactual explanations fails to satisfy Woodward’s requirement that: a satisfactory explanation of prediction y should state a generalization relating X and Y.
For example, suppose that a machine learning system has assigned Mr Jones a probability of 0.75 for defaulting on a loan. Although stating the changes needed to his salary and years of employment has explanatory value, this falls short of being a satisfactory explanation. A satisfactory explanation also needs to explain:

why Mr Jones was assigned a score of 0.75. This would include identifying the contribution that each feature made to the score.

how the features interact with each other. For example, perhaps the number of years employed is only relevant for individuals with salaries below $34,000.
These requirements could be satisfied by stating an explanatory equation that included interaction terms and indicator variables. At a minimum, the equation’s scope should cover a neighbourhood around x that includes the data points identified by its wcounterfactuals.
2.2 Local Interpretable ModelAgnostic Explanations
Ribeiro et al. LIME () propose LIME, which seeks to explain why m predicts y given x by generating a simple linear regression model that approximates m’s inputoutput behaviour with respect to a small neighbourhood of m’s feature space centered on x. LIME assumes that for such small neighbourhoods m’s inputoutput behaviour is approximately linear. Ribeiro et al. recognize that there is often a trade off to be made between local fidelity and interpretability. For example, increasing the number of independent variables in a regression might increase local fidelity but decrease interpretability. LIME is becoming an increasingly popular method, and there are now LIME implementations in multiple packages including Python, R and SAS.
The LIME algorithm: Consider a model m (e.g. a random forest or MLP) whose prediction is to be explained: The LIME algorithm:
(1) generates a dataset of synthetic observations; (2) labels the synthetic data by passing it to the model m which calculates probabilities for each observation belonging to each class. These probabilities are the ground truths that LIME is trying to explain;
(3) weights the synthetic observations (in standardised form) using the kernel:
where d is the Euclidean distance from x to the synthetic observation, and the default value for kernel width is a function of the number of features in the training dataset; (4) produces a locally weighted regression, using all the synthetic observations. The regression coefficients are meant to explain m’s forecast y.
Ribeiro provides an online ’tutorial’ on LIME, which includes an example of a random forest model of the IRIS dataset
The left graphic shows that the random forest model predicted that x belongs to the class ‘setosa’ with a probability of 1. The middle graphic displays LIME’s regression coefficients. The right graphic displays the values of observation x that were used in the prediction, in this case the user specified that only the three most relevant variables were to be used (these are highlighted in turquoise).
Key problems with LIME: LIME does not measure the fidelity of its regression coefficients. This hides that it may often be producing false explanations. Although LIME displays the values of its regression coefficients, it does not display the predicted values y calculated by its regression model. Let us refer to these values as regression scores (they are not bounded by the interval [0,1]). As part of this paper’s analysis, LIME’s regression scores were calculated for the IRIS test set revealing large errors. For example, in 20% of explanations, the regression scores for setosa differed by more than 0.45 from the probabilities calculated by the random forest model m. Take the example in Figure 1, where m predicted probabilities {1,0,0}; LIME’s regression scores were {0.54,0.47,0.02} i.e. in this case, LIME’s scores are completely inconsistent with m’s prediction and the corresponding LIME regression coefficients should not be used in its explanation.
It might be thought that an adequate solution would be to provide a goodnessoffit measure such as adjusted Rsquared. However, as will be explained in Section 3, such measures can be highly misleading when evaluating the fidelity of the regression coefficients for estimating wcounterfactuals.
Another problem is that LIME does not provide counterfactual explanations. It might be argued that LIME’s regression equations provide the basis for a user to perform their own counterfactual calculations. However, there are multiple reasons why this is incorrect. First, as will be shown in Section 3, additional functionality is necessary for generating faithful counterfactuals including the ability to solve quadratic equations. Second, LIME does not ensure that the regression model correctly classifies x. In cases where the regression misclassifies x’s class, then any subsequent wcounterfactual will be false. Third, it does not have the means of measuring the fidelity of any counterfactual calculations derived from its regression equation. Fourth, LIME does not offer an adequate dataset for calculating counterfactuals. The data that LIME uses in a local regression need to be representative of the neighbourhood that its explanation is meant to apply to. For counterfactual explanations, this target neighbourhood needs to extend from x to the nearest points of m’s decision boundary. Furthermore, the type of Kernel being used is unsuitable; its weightings are too centered around x, when other points (e.g. at the decision boundary) are also important.
2.3 Other Related Work
Early work seeking to provide explanations to neural networks have been focused on the extraction of symbolic knowledge from trained networks survey (), either decision trees in the case of feedforward networks Trepan () or graphs in the case of recurrent networks Jacobsson (); leegiles (). More recently, attention has been shifted from global to local explanation models due to the very largescale nature of current deep networks, and has been focused on explaining specific network architectures (such as the bottleneck in autoencoders Irina ()) or domain specific networks such as those used to solve computer vision problems dissection (), although some recent approaches continue to advocate the use of rulebased knowledge extraction Corels (); sontran (). The reader is referred to recentSurvey () for a recent survey.
More specifically, Guidotti et al. LORE () have proposed LORE – Local Rule based Explanations, which provides local explanations for binary classification tasks using decision trees. It is modelagnostic, generates local models from synthetic data, has many other similarities to LIME, but it also generates counterfactual explanations. Guidotti et al. criticise LIME for producing neighbourhood datasets whose observations are too distant from each other and have too low a density around x. By contrast LORE uses a genetic algorithm to create neighbourhood datasets with a high density around x and the decision boundary.
Guidotti et al. claim that their system outperforms LIME and they provide fidelity statistics comparing LORE and LIME, where fidelity is defined in terms of how well local models perform in making the same classifications as the underlying machine learning system. However, their fidelity statistics for LIME could be misconstrued; it does not follow from being able to mimic a system’s classifications that a local model will also faithfully mimic its counterfactuals (see Section 4).
Ribeiro et al. ANCHORS (), the authors of the LIME paper, have subsequently proposed ‘Anchors: High Precision ModelAgnostic Explanations’. In motivating their new method they note that LIME does not measure its own fidelity and that ’even the local behaviour of a model may be extremely nonlinear, leading to poor linear approximations’. An Anchor is a rule that is sufficient to ensure that a local prediction will remain with the same classification, irrespective of the values of other variables. By an Anchor being sufficient Ribeiro et al. mean that the local prediction ‘will almost always’ remain unchanged. They specify a pureexploration multiarmed bandit algorithm that efficiently identifies Anchors. The Anchor method does not have the capacity to generate counterfactuals.
LIME has spawned several variants. For example LIMESUP LIMESUP () and ’s KLIME H20 () both seek to explain a machine learning system’s functionality over its entire input space by partitioning the input space into a set of neighbourhoods, and then creating local models. KLIME uses clustering and then regression, LIMESUP just uses decision tree algorithms. LIME has also been adapted to enable novel applications, for example SLIME SLIME () provides explanations of sound and music content analysis. However none of these variants address the problems identified in this paper.
3 The CLEAR Method
CLEAR is based on the view that a satisfactory explanation of a single prediction needs to both explain the value of that prediction and answer ’whatifthingshadbeendifferent’ questions. In doing this it needs to state the relative importance of the input features and show how they interact. A satisfactory explanation must also be measurable and state how well it can explain a model. It must know when it does not know zoubin ().
CLEAR is based on the concept of wperturbation, as follows:
Definition 5.1 Let min(x) denote a vector resulting from applying a minimum change to the value of one feature in x such that m(min(x)) = y’ and m(x) = y, class(y) class(y’). Let (x) denote the value of feature in x. A wperturbation is defined as the change in value of feature for a target class y’, that is (min(x)) (x).
For example, for the wcounterfactual that Mr Jones would have received his loan if his salary had been $35,000, a wperturbation for salary is $3000. If x has features and m solves a class problem then there are wperturbations of x; changes in a feature value may not always imply a change of classification.
CLEAR compares each wperturbation with an estimate of that value, call it estimated wperturbation, calculated using its local regression, to produce a fidelity error, as follows:
fidelity error estimated wperturbation wperturbation
CLEAR generates an explanation of prediction y for observation x by the following steps:

Determine x’s wperturbations for a userselected set of features. This is achieved by querying m with feature values starting with x and progressively moving away from x at regular intervals up to a range of possible feature values.

Generate labelled synthetic observations (default is 50,000 observations). Data for numeric features is generated by sampling from a uniform distribution. Data for categorical features is generated by sampling in proportion to the frequencies found in the training set. The synthetic observations are labelled by being passed through m.

Create a balanced neighbourhood data set (default is 200 observations). Synthetic observations that are near to x (Euclidean distance) are selected with the objective of achieving a dense cloud of points around m’s decision boundaries. For this, the neighbourhood data is selected such that it is equally distributed across classes, i.e. approximately balanced.

Perform a stepwise regression on the neighbourhood data set, under the constraint that the regression curve should go through x. The regression can include second degree terms, interaction terms and indicator variables. CLEAR provides options for both multiple and logistic regression.

Estimate the wperturbations by substituting x’s wcounterfactual values from min(x), other than for feature , into the regression equation and calculating the value of . See example below.

Measure the fidelity of the regression coefficients. Fidelity errors are calculated by comparing the actual wperturbations determined in step 1 with the estimates calculated in step 5.

Iterate to best explanation. Because CLEAR produces fidelity statistics, its parameters can be iteratively changed in order to achieve a better tradeoff between interpretability and fidelity. Relevant parameters include the number of features/independent variables to consider and the use and number of quadratic or interaction terms.

CLEAR also provides the option of adding x’s wcounterfactuals, min(x), to x’s neighbourhood data set. The wcounterfactuals are weighted and act as soft constraints on CLEAR’s subsequent regression. Algorithms 1 and 2 outline the entire process.
Example of using regression to estimate a wperturbation: An MLP with a softmax activation function in the output layer was trained on a subset of the UCI Pima Indians Diabetes dataset. The MLP calculated a probability of 0.69 for x belonging to class 1 (having diabetes). CLEAR generated the logistic regression equation where:
Let the decision boundary be . Thus, x is on the boundary when . The estimated wperturbation for Glucose is obtained by substituting into the regression equation: and the value of BloodPressure in x:
Solving this equation, CLEAR selects the root equal to 0.025 as being closest to the original value of Glucose in x. The original value for Glucose was 0.537 and hence the estimated wperturbation is 0.512. The actual wperturbation (step 1) for Glucose to achieve a probability of 0.5 of being in class 1 was 0.557; hence, the fidelity error was 0.045.
In summary, a CLEAR explanation has two parts: the first provides x’s wcounterfactuals and the second states the corresponding regression and its fidelity errors. Figure 2 shows excerpts from a CLEAR report.
A CLEAR prototype has been developed in Python for binary classification tasks
Definition 5.3 (% fidelity): A wperturbation is said to be feasible if the resulting feature value is within the range of values found in m’s training set. The percentage fidelity given a batch and error threshold T is the number of wperturbations with fidelity error smaller than T divided by the number of feasible wperturbations.
Notice that for CLEAR an explanation (expl) is a tuple , where and are wperturbations (actual and estimated), is a regression equation and are fidelity errors. In Algorithm 2, the values of and are fixed to create a balanced neighbourhood assuming as decision boundary, as in the above example.
4 Experimental Results
Experiments were carried out with four UCI datasets of binary classification problems: Pima Indians Diabetes (with 8 numeric features), Default of Credit Card Clients (20 numeric features, 3 categorical), and subsets of Adult (with 2 numeric features, 5 categorical features), and Breast Cancer Wisconsin (9 numeric features). For the Adult dataset, some of the categorical features values were merged and features with little predictive power were removed. With the Breast Cancer dataset only the mean value features were kept. For reproducibility, the code for preprocessing the data is included with the CLEAR prototype on GitHub.
For each dataset, an MLP with a softmax output layer was trained using Tensorflow. Each dataset was partitioned into: an MLP training dataset (out of which 100 observations were selected for determining the total number of synthetic observations and neighbourhood data to be generate by CLEAR) and an MLP test dataset (out of which 100 observations were selected for calculating the % fidelity of CLEAR and LIME). The code for the MLP training is also included on GitHub. Experiments were carried out with different test sets, with each experiment being repeated 20 times for different generated synthetic data. The experiments were carried out on a Windows i77700HQ 2.8 GHz PC. A single run of a 100 observations took 4080 minutes, depending on the dataset. The CLEAR prototype has not yet been developed for multiclass datasets.
In order to enable comparisons with LIME, CLEAR includes an option to run the LIME algorithms for creating synthetic data and generating regression equations. CLEAR then takes the regression equations and calculates the corresponding wcounterfactuals and fidelity errors.
Pima  Adult  Credit  Breast  

CLEAR not using wcounterfactuals  57% 0.8  80% 0.9  39% 1.3  54% 1.1 
CLEAR using wcounterfactuals  77% 0.8  80% 0.8  55% 1.7  81% 1.3 
LIME algorithms  20% 0.4  26% 0.6  12% 0.5  20% 0.5 
CLEAR’s regressions are significantly better than LIME’s. The best results are obtained by including wcounterfactuals in the neighbourhood datasets (step 8 of the CLEAR method). Overall, the best configuration comprised: using balanced neighbourhood data, forcing the regression curve to go through x (i.e. ’centering’), including both quadratic and interaction terms, and using logistic regression for Pima and Breast Cancer and multiple regression for Adult and Credit data sets. Unless otherwise stated % fidelity is for the error threshold T = 0.25. Table 1 compares the % fidelity of CLEAR and LIME (i.e. using LIME’s algorithms for generating synthetic data and performing the regression). This used LIME’s default parameter values except for the following beneficial changes: the number of synthetic data points was increased to 15,000 (further increases did not improve fidelity), the data was not discretized, a maximum of 15 features were allowed, several kernel widths in the range from 1.5 to 4 were evaluated. By contrast, CLEAR was run with its best configuration and with 14 features. As an example of LIME’s performance: with the Credit dataset, the adjusted averaged 0.7, the classification of the test set observations was over 94% correct. However, the absolute error between y and LIME’s estimate of y was 8% (e.g. the MLP forecast , while LIME estimated it at 0.48) and this by itself would lead to large errors when calculating how much a single feature needs to change for y to reach the decision boundary. LIME’s fidelity of only 2%, illustrates that CLEAR’s measure of fidelity is far more demanding than just classification accuracy. Of course, LIME’s poor fidelity was due, in part, to its kernel failing to isolate the appropriate neighbourhood data sets necessary for calculating wcounterfactuals accurately. A further problem is that LIME converts categorical features into Boolean variables, often loosing valuable information. This can be seen in the very poor fidelity statistics for the Adult and Credit data sets which contain categorical variables, differently from Pima and Breast data.
Table 2 shows how CLEAR’s fidelity (not using wcounterfactuals) varied with the maximum ’number of independent variables’ allowed in a regression. At first, fidelity sharply improves but then plateaus.
No.  PIMA  Adult  Credit  Breast 

8  42%  35%  27%  43% 
11  53%  76%  38%  46% 
14  57%  80%  39%  54% 
17  59%  78%  40%  55% 
20  62%  78%  39%  56% 
Despite the regression fitting the neighbourhood data well, a significant number of the estimated wcounterfactuals have large fidelity errors. For example, in one of the experiments with the Adult dataset where CLEAR’s multiple regression did not center the data, the average adjusted was 0.97, classification accuracy 98% but the ’% fidelity error < 0.25’ was 59%. This points to a more general problem: sometimes the neighbourhood data sets do not represent the regions of its feature space that are central for its explanations. With CLEAR, this discrepancy can at least be measured.
CLEAR was tested in a variety of configurations. These included the best configuration, and configurations where a single option was altered from the default, e.g. by using a imbalanced neighbourhood of points nearest to x. Figure 3 displays the results when CLEAR used a maximum of 14 independent variables.
CLEAR’s fidelity was sharply improved by adding x’s wcounterfactuals to its neighbourhood datasets. In the previous experiments, CLEAR created a neighbourhood dataset of at least 200 synthetic observations, each being given a weighting of 1. This was now altered so that each wcounterfactual identified in step 1 was added and given a weighting of 10. For example, for the Pima dataset, an average of 3 wcounterfactuals were added to each neighbourhood dataset. The consequent improvement in fidelity indicates as expected that adding these weighted data points results in a dataset capable of representing better the relevant neighbourhood, with CLEAR being able to provide a regression equation that is more faithful to wcounterfactuals.
5 Conclusion and Future Work
CLEAR explains a prediction y for data point x by stating x’s wcounterfactuals and providing a regression equation. The regression shows the patterns of counterfactual dependencies in a neighbourhood that includes both x and the wcounterfactual data points. CLEAR represents a significant improvement both on LIME and on just using wcounterfactuals. Key to CLEAR’s performance is the ability to generate relevant neighbourhood data bounded by its wcounterfactuals. Adding these to x’s neighbourhood led to sharp fidelity improvements. Other data points could also be added, for example wcounterfactuals involving changes to multiple features. A user might also include some perturbations that are important to their project. CLEAR could guide this process by reporting regression and fidelity statistics. And it could replace step 8 of the algorithm by increasingly complex and more sophisticated learning algorithms. Constructing neighbourhood datasets in this way would seem a better approach than randomly generating data points and then selecting those closest to x. The prototype will also be extended to multiclass tasks. A criticism of CLEAR might be that its regression equations are not sufficiently interpretable. This is because they include a large number of terms, including 2nd degree and interaction variables. A first response is that the inclusion of these terms is necessary (though not sufficient) for CLEAR’s regression to be faithful. A second response is that CLEAR’s equations can easily be simplified by substituting in the values of any feature in x that are not of interest to the user. This is to be evaluated in practice in the context of comprehensibility studies.
Footnotes
 https://marcotcr.github.io/lime/tutorials/Tutorial%20%20continuous%20and%20categorical%20features.html
 https://github.com/ClearExplanationsAI
References
 Robert Andrews, Joachim Diederich, and Alan B. Tickle. Survey and critique of techniques for extracting rules from trained artificial neural networks. KnowledgeBased Systems, 8(6):373–389, December 1995.
 Elaine Angelino, Nicholas LarusStone, Daniel Alabi, Margo Seltzer, and Cynthia Rudin. Learning certifiably optimal rule lists for categorical data. Journal of Machine Learning Research, 18:234:1–234:78, 2017.
 David Bau, Bolei Zhou, Aditya Khosla, Aude Oliva, and Antonio Torralba. Network dissection: Quantifying interpretability of deep visual representations. In Proc. CVPR 2017, Honolulu, USA, 2017.
 Mark W. Craven and Jude W. Shavlik. Extracting treestructured representations of trained networks. In Proc. NIPS 1995, pages 24–30. MIT Press, 1995.
 Zoubin Ghahramani. Probabilistic machine learning and artificial intelligence. Nature, 521(7553):452–459, 2015.
 Riccardo Guidotti, Anna Monreale, Salvatore Ruggieri, Dino Pedreschi, Franco Turini, and Fosca Giannotti. Local rulebased explanations of black box decision systems. CoRR, abs/1805.10820, 2018.
 Riccardo Guidotti, Anna Monreale, Franco Turini, Dino Pedreschi, and Fosca Giannotti. A survey of methods for explaining black box models. CoRR, abs/1802.01933, 2018.
 Patrick Hall, Navdeep Gill, Megan Kurka, and Wen Phan. Machine learning interpretability with H2O driverless AI, 2017. URL: http://docs.h2o.ai/ driverlessai/lateststable/docs/booklets/MLIBooklet.pdf.
 Irina Higgins, Loïc Matthey, Arka Pal, Christopher Burgess, Xavier Glorot, Matthew Botvinick, Shakir Mohamed, and Alexander Lerchner. betavae: Learning basic visual concepts with a constrained variational framework. In Proc. ICLR 2017, Toulon, France, 2017.
 Linwei Hu, Jie Chen, Vijayan N. Nair, and Agus Sudjianto. Locally interpretable models and effects based on supervised partitioning (LIMESUP). CoRR, abs/1806.00663, 2018.
 Henrik Jacobsson. Rule extraction from recurrent neural networks: A taxonomy and review. Neural Computation, 17(6):1223–1263, June 2005.
 Saumitra Mishra, Bob L. Sturm, and Simon Dixon. Local interpretable modelagnostic explanations for music content analysis. In Proc. ISMIR 2017, Suzhou, China, pages 537–543, 2017.
 Brent Mittelstadt, Chris Russell, and Sandra Wachter. Explaining explanations in AI. In In Proc. Conference on Fairness, Accountability, and Transparency, FAT’19, pages 279–288, New York, USA, 2019.
 Judea Pearl. Causality: Models, Reasoning and Inference. Cambridge University Press, New York, NY, USA, 2nd edition, 2009.
 Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. Why should i trust you? explaining the predictions of any classifier. In Proc. ACM SIGKDD 2016, KDD ’16, pages 1135–1144, New York, NY, USA, 2016. ACM.
 Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. Anchors: Highprecision modelagnostic explanations. In Proc. AAAI 2018, pages 1527–1535, New Orleans, USA, 2018.
 Son N. Tran and Artur S. d’Avila Garcez. Deep logic networks: Inserting and extracting knowledge from deep belief networks. IEEE Transactions on Neural Networks and Learning Systems, 29(2):246–258, 2018.
 Sandra Wachter, Brent D. Mittelstadt, and Chris Russell. Counterfactual explanations without opening the black box: Automated decisions and the GDPR. CoRR, abs/1711.00399, 2017.
 Qinglong Wang, Kaixuan Zhang, Alexander G. Ororbia, II, Xinyu Xing, Xue Liu, and C. Lee Giles. An empirical evaluation of rule extraction from recurrent neural networks. Neural Computation, 30(9):2568–2591, September 2018.
 J. Woodward. Making things happen: a theory of causal explanation. Oxford University Press, 2003.