An unexpected unity among methods for interpreting model predictions
Understanding why a model made a certain prediction is crucial in many data science fields. Interpretable predictions engender appropriate trust and provide insight into how the model may be improved. However, with large modern datasets the best accuracy is often achieved by complex models even experts struggle to interpret, which creates a tension between accuracy and interpretability. Recently, several methods have been proposed for interpreting predictions from complex models by estimating the importance of input features. Here, we present how a modelagnostic additive representation of the importance of input features unifies current methods. This representation is optimal, in the sense that it is the only set of additive values that satisfies important properties. We show how we can leverage these properties to create novel visual explanations of model predictions. The thread of unity that this representation weaves through the literature indicates that there are common principles to be learned about the interpretation of model predictions that apply in many scenarios.
references
Introduction
A correct interpretation of a prediction model’s output is extremely important. This often leads to the use of simple models (e.g., linear models) although they are often less accurate than complex models. The growing availability of big data from complex systems has lead to an increased use of complex models, and so an increased need to improve their interpretability. Historically, models have been considered interpretable if the behavior of the model as a whole can be summarized succinctly. Linear models, for example, have a single vector of coefficients, which describe the relationships between features and a prediction across all samples. Although these relationships are not succinctly summarized in complex models, if we focus on a prediction made on a particular sample, we can describe the relationships more easily. Recent modelagnostic methods leverage this property by summarizing the behaviour of the complex models only with respect to a single prediction ribeiro2016should; vstrumbelj2014explaining.
Here, we extend a prediction explanation method based on game theory, specifically on the Shapley value, which describes a way to distribute the total gains to players, assuming they all collaborate vstrumbelj2014explaining. We show how this method by Štrumbelj et al. can be extended to unify and justify a wide variety of recent approaches to interpreting model predictions (Figure 1). We term these feature importance values expectation Shapley (ES) values; because when the model output is viewed as a conditional expectation (of given ), these values are equivalent to the Shapley values, i.e., distribution of credit from coalescent game theory. Intriguingly, ES values connect with and motivate several other current prediction explanation methods:
LIME is a method for interpreting individual model predictions based on locally approximating the model around a given prediction ribeiro2016should. ES values fit into the formalism proposed by LIME and justify a specific local sample weighting kernel. The examples in Ribeiro et al. (2016) ribeiro2016should can be viewed as approximations of ES values with a different weighting kernel defining locality.
DeepLIFT was recently proposed as a recursive prediction explanation method for deep learning shrikumar2016not. DeepLIFT values are ES values for a linearized version of the deep network. This connection motivates the use of DeepLIFT as an extremely efficient samplingfree approximation to ES values. ES values can be also used to uniquely justify specific linearization choices DeepLIFT must make.
Layerwise relevance propagation is another method for interpreting the predictions of compositional models, such as deep learning bach2015pixel. As noted by Shrikumar et al., layerwise relevance propagation is equivalent to DeepLIFT with the reference activations of all neurons fixed to zero shrikumar2016not. This implies that layerwise relevance propagation is also an approximation of ES values, where the primary difference from DeepLIFT is the choice of a reference input to approximate the effect of missing values. By noting that both DeepLIFT and layerwise relevance propagation are ES value approximations, we can see that DeepLIFT’s proposed improvement over layerwise relevance propagation is a change that makes DeepLIFT a better approximation of ES values.
Shapley regression values are an approach to computing feature importance in the presence of multicollinearity lipovetsky2001analysis. They were initially designed to mitigate problems with the interpretability of linear models (although those are typically considered easy to interpret), though they can be applied to other models as well. Shapley regression values require retraining the model on all feature subsets, and can be considered a brute force method of computing ES values. By viewing the model output as an expected value, ES values allow fast approximations in situations where training models on all feature subsets would be intractable.
Expectation Shapley values and LIME
Understanding why a model made a prediction requires understanding how a set of interpretable model inputs contributed to the prediction. The original inputs may be hard for a user to interpret, so a transformation to a new set of interpretable inputs is often needed. ES values set to a binary vector of length representing if an input value (or group of values) is known or missing. This mapping takes an arbitrary input space and converts it to an interpretable binary vector of feature presence. For example, if the model inputs are word embedding vectors, then could be a binary vector of our knowledge of word presence vs. absence. If the model input is a vector of realvalued measurements, could be a binary vector representing if a group of measurements was observed or missing.
Prediction interpretation methods seek to explain how the interpretable inputs contributed to the prediction. While the parameters of the original model define this relationship, they do so in a potentially complex manner and do not utilize the interpretable inputs . To provide interpretability, these methods learn a simple approximation to the original model for an individual prediction. Inspecting provides an understanding of the original model’s behavior near the prediction. This approach to local model approximation was formalized recently in Ribeiro et al. as finding an interpretable local model that minimizes the following objective function ribeiro2016should:
(1) 
Faithfulness of the simple model to the original model is enforced through the loss over a set of samples in the interpretable data space weighted by . penalizes the complexity of .
Given the above formulation for we show the potentially surprising result that if is assumed to follow the simple additive form:
(2) 
where (a shortened version of when and are clear) are parameters to be optimized, then the loss function , the sample weighting kernel , and the regularization term are all uniquely determined (up to transformations that do not change ) given three basic assumptions from game theory. These assumptions are:

Efficiency.
(3) This assumption forces the model to correctly capture the original predicted value.

Symmetry. Let be an indicator vector equal to for indexes , and elsewhere, and let . If for all subsets that do not contain or
(4) then . This states that if two features contribute equally to the model then their effects must be the same.

Monotonicity. For any two models and , if for all subsets that do not contain
(5) then . This states that if observing a feature increases more than in all situations, then that feature’s effect should be larger for than for .
Breaking any of these axioms would lead to potentially confusing behavior. In 1985, Peyton Young demonstrated that there is only one set of values that satisfies the above assumptions and they are the Shapley values young1985monotonic; roth1988shapley. ES values are Shapley values of expected value functions, therefore they are the only solution to Equation 1 that conforms to Equation 2 and satisfies the three axioms above. This optimality of ES values holds over a large class of possible models, including the examples used in the LIME paper that originally proposed this formalism ribeiro2016should.
We found the specific forms of , , and that lead to Shapley values as the solution and they are:
(6) 
It is important to note that when , which enforces and . In practice these infinite weights can be avoided during optimization by analytically eliminating two variables using these constraints. Figure 2A compares our Shapley kernel with previous kernels chosen heuristically. The intuitive connection between linear regression and classical Shapley value estimates is that classical Shapley value estimates are computed as the mean of many function outputs. Since the mean is also the best least squares point estimate for a set of data points it is natural to search for a weighting kernel that causes linear least squares regression to recapitulate the Shapley values.
Expectation Shapley values and DeepLIFT
DeepLIFT computes the impact of inputs on the outputs of compositional models such as deep neural networks. The impact of an input on a model output is denoted by and
(7) 
where is a "reference input" designed to represent typical input values. ES value implementations approximate the impact of missing data by taking expectations, so when interpreting as an estimate of DeepLIFT is an additive model of the same form as ES values. To enable efficient recursive computation of DeepLIFT assumes a linear composition rule that is equivalent to linearizing the nonlinear components of the neural network. Their backpropagation rules that define how each component is linearized are intuitive, but arbitrary. If we interpret DeepLIFT as an approximation of ES values, then we can justify a unique set of linearizations for network components based on analytic solutions of the ES values for that component type. One example where this leads to a different, potentially improved, assignment of responsibility is the function (Figure 2B).
Visualization of Expectation Shapley values
Model interpretability is closely tied to human perception. We designed a simple visualization based on analogy with physical force (Figure 3A). Each interpretable input is assigned a bar segment. The width of the segment is equal to the ES value . Red bar segments correspond to inputs where , and blue segments to inputs where . The model output starts at the base value in the center and then is pushed right by the red bars or left by the blue bars in proportion to their length. The final location of the model output is then equal to .
While explaining a single prediction is very useful, we often want to understand how a model is performing across a dataset. To enable this we designed a visualization based on rotating the single prediction visualization (Figure 3A) by 90°, then stacking many horizontally. By ordering the predictions by explanation similarity we can see interesting patterns (Figure 3B). One such insight for the popular UCI adult census dataset is that marriage status is the most powerful predictor of income, suggesting that many joint incomes were reported, not simply individual incomes as might be at first assumed. For implementation code see https://github.com/slundberg/esvalues.
Sample Efficiency and the Importance of the Shapley Kernel
Connecting Shapley values from game theory with locally weighted linear models brings advantages to both concepts. Shapley values can be estimated more efficiently, and locally weighted linear models gain theoretical justification for their weighting kernel. Here we briefly illustrate both the improvement in efficiency for Shapley values, and the importance of kernel choice for locally weighted linear models (Figure 4).
Shapley values are classically defined by the impact of a feature when it is added to features that came before it in an ordering. The Shapley value for that feature is the average impact over all possible orderings:
(8) 
where is the set of all permutations of length , and is the set of all features whose index comes before in permutation . This leads to a natural estimation approach which involves taking the average over a small sample of all orderings vstrumbelj2014explaining. While this standard approach is effective in small (or nearly linear) models, penalized regression (using Equation 6) produces much more accurate Shapley value estimates for nonlinear models such as a dense decision tree over 10 features (Figure 4A), and a sparse decision tree using only 3 of 100 features (Figure 4B).
While the axioms presented above provide a compelling reason to use the Shapley kernel (Equation 6), it natural to wonder if any reasonable local weighting kernel would produce results similar to the Shapley kernel. It turns out this is not the case, and the Shapley kernel significantly effects how we attribute nonlinear effects to various features when compared to the standard exponential kernel used by LIME. For the sparse decision tree used above there is a noticeable change in the magnitude of feature impacts (Figure 4B), and for the dense decision tree we even see the direction of estimated effects reversed (Figure 4A).