Teaching Meaningful Explanations

Teaching Meaningful Explanations

Noel C. F. Codella,* Michael Hind,* Karthikeyan Natesan Ramamurthy,*
Murray Campbell, Amit Dhurandhar, Kush R. Varshney, Dennis Wei,
Aleksandra Mojsilović
* These authors contributed equally.
IBM Research
Yorktown Heights, NY 10598

The adoption of machine learning in high-stakes applications such as healthcare and law has lagged in part because predictions are not accompanied by explanations comprehensible to the domain user, who often holds ultimate responsibility for decisions and outcomes. In this paper, we propose an approach to generate such explanations in which training data is augmented to include, in addition to features and labels, explanations elicited from domain users. A joint model is then learned to produce both labels and explanations from the input features. This simple idea ensures that explanations are tailored to the complexity expectations and domain knowledge of the consumer. Evaluation spans multiple modeling techniques on a simple game dataset, an image dataset, and a chemical odor dataset, showing that our approach is generalizable across domains and algorithms. Results demonstrate that meaningful explanations can be reliably taught to machine learning algorithms, and in some cases, improve modeling accuracy.


Teaching Meaningful Explanations

  Noel C. F. Codella,* Michael Hind,* Karthikeyan Natesan Ramamurthy,* Murray Campbell, Amit Dhurandhar, Kush R. Varshney, Dennis Wei, Aleksandra Mojsilović * These authors contributed equally. IBM Research Yorktown Heights, NY 10598 {nccodell,hindm,knatesa,mcam,adhuran,krvarshn,dwei,aleksand}@us.ibm.com



1 Introduction

New regulations call for automated decision making systems to provide “meaningful information” on the logic used to reach conclusions [1, 2, 3, 4]. Selbst and Powles interpret the concept of “meaningful information” as information that should be understandable to the audience (potentially individuals who lack specific expertise), is actionable, and is flexible enough to support various technical approaches.

To facilitate the development and study of explanations, we define an explanation as information provided in addition to an output that can be used to verify the output. In the ideal case, an explanation should enable a human user to independently determine whether the output is correct. The requirements of meaningful information have two implications for explanations:

  1. Complexity Match: The complexity of the explanation needs to match the complexity capability of the consumer [5, 6]. For example, an explanation in equation form may be appropriate for a statistician, but not for a nontechnical person [7].

  2. Domain Match: An explanation needs to be tailored to the domain, incorporating the relevant terms of the domain. For example, an explanation for a medical diagnosis needs to use terms relevant to the physician (or patient) who will be consuming the prediction.

Work in social and behavioral sciences [8, 7, 9] has found that people prefer explanations that are simpler, more general, and coherent even over more likely ones. Moreover, the European Union GDPR guidelines [4] say: "The controller should find simple ways to tell the data subject about the rationale behind, or the criteria relied on in reaching the decision without necessarily always attempting a complex explanation of the algorithms used or disclosure of the full algorithm."

In this paper, we take this guidance to heart by asking consumers themselves to provide explanations along with feature/label pairs that are meaningful to the label decisions. We then use this augmented training set to learn models that predict explanations along with labels for new unseen samples.

There are many possible instantiations for this proposed paradigm of teaching explanations. One is to simply expand the label space to be the Cartesian product of the original labels and the elicited explanations. Another builds upon the tradition of similarity metrics, case-based reasoning and content-based retrieval. Existing approaches that only have access to features and labels are unable to find meaningful similarities. However, with the advantage of having training features, labels, and explanations, we propose to learn feature embeddings guided by labels and explanations. This allows us to infer explanations for new data using nearest neighbor approaches. We present a new objective function to learn an embedding to optimize nearest neighbor search for both prediction accuracy as well as holistic human relevancy to enforce that returned neighbors present meaningful information.

We demonstrate the proposed paradigm and two of its instantiations on a tic-tac-toe dataset that we create, a publicly-available image aesthetics dataset [10], and a publicly-available olfactory pleasantness dataset [11, 12]. Teaching explanations, of course requires a training set that contains explanations. Since such datasets are not readily available, we use the attributes given with the aesthetics and pleasantness datasets in a unique way: as collections of meaningful explanations.

The main contributions of this work are:

  • A new approach for machine learning algorithms to provide meaningful explanations that match the complexity and domain of consumers by eliciting training explanations directly from them. We name this paradigm TED for ‘Teaching Explanations for Decisions.’

  • A method to teach explanations by augmenting the label space with explanations.

  • A new prototype classification method that jointly learns similarities in feature space and explanation space.

  • Evaluation on disparate datasets demonstrating the efficacy of the paradigm.

2 Related Work

Prior work in providing explanations can be partitioned into several areas:

  1. Making existing or enhanced models interpretable, i.e. to provide a precise description of how the model determined its decision (e.g. [13, 14, 15, 16, 17, 18, 19])

  2. Creating a second, simpler-to-understand model, such as a small number of logical expressions, that mostly matches the decisions of the deployed model (e.g. [20, 21]).

  3. Leveraging “rationales”, “explanations”, “attributes”, or other “privileged information” in the training data to help improve the accuracy of the algorithms [22, 23, 24, 25, 26, 27, 28, 29]

  4. Work in the natural language processing and computer vision domains that generate rationales/explanations derived from input text [30, 31, 32]

  5. Content-based retrieval methods that provide explanations as evidence employed for a prediction, i.e. nearest neighbor classification and regression [33, 34, 35, 36, 37]

The first two groups attempt to provide precise descriptions on how a machine learning decision was made, which is particularly relevant for AI system builders. This insight can be used to help improve the AI system and may serve as the seeds for an explanation to a non-AI expert. However, work still remains to determine if these seeds are sufficient to satisfy the needs of a non-AI expert. In particular, when the underlying features are not human comprehensible, these approaches are inadequate for providing human consumable explanations.

The third group is similar to the approach presented in this work, in that they leverage additional information (explanations) in the training data. However, in the cited work the explanations are used to create a more accurate model. In this work the explanations are used to teach the machine how to generate explanations for new predictions.

The fourth group seeks to generate textual explanations with predictions. In the cases of text classification, this involves selecting the minimal necessary content from a text body that is sufficient to trigger the classification. In the case of computer vision [38], this involves utilizing textual captions to automatically generate new textual captions of image that are both descriptive as well as discriminative. While serving to enrich an understanding of the predictions, these systems do not necessarily facilitate an improved ability for a human user to understand system failures.

The fifth group creates explanations in the form of decision evidence: using some feature embedding to perform k-nearest neighbor search, using those k neighbors to make a prediction, and demonstrating to the user the nearest neighbors and any relevant information regarding them. While this approach is fairly straightforward and holds a great deal of promise, the approach has historically suffered from the issue of the semantic gap: distance metrics in the realm of the feature embeddings do not necessarily yield neighbors that are relevant for prediction. More recently, deep feature embeddings, optimized for generating predictions, have made significant advances in reducing the semantic gap. However, there still remains a “meaning gap” — while systems have gotten good at returning neighbors with the same label as a query, they do not necessarily return neighbors that agree with any holistic human measures of similarity. As a result, users are not necessarily inclined to trust system predictions.

Doshi-Velez et al. [39] discuss the societal, moral, and legal expectations of AI explanations, provide guidelines for the content of an explanation, and recommend that explanations of AI systems be held to a similar standard as humans. Our approach is compatible with their view. Biran and Cotton [40] provide an excellent overview and taxonomy of explanations and justifications in machine learning.

Miller et al. [9, 7] argue that explainable AI solutions need to meet the needs of the users, an area that has been well studied in philosophy, psychology, and cognitive science. They provides a brief survey of the most relevant work in these fields to the area of explainable AI. They, along with Doshi-Velez and Kim [41], call for more rigor in this area.

3 Methods

The primary motivation of the TED paradigm is to provide meaningful explanations to consumers by leveraging the consumers’ knowledge of what will be meaningful to them. Section 3.1 formally describes the problem space that defines the TED approach. One simple learning approach to this problem is to expand the label space to be the Cartesian product of the original labels and the provided explanations. Although quite simple, this approach has a number of pragmatic advantages in that it is easy to incorporate, it can be used for any learning algorithm, it does not require any changes to the learning algorithm, and does not require owners to make available their algorithm. It also has the possibility of some indirect benefits because requiring explanations will improve auditability (all decisions will have explanations) and potentially reduce bias in the training set because inconsistencies in explanations may be discovered.

Other instantiations of the TED approach may leverage the explanations to improve model prediction and possibly explanation accuracy. Section 3.2 takes this approach to learn feature embeddings and explanation embeddings in a joint and aligned way to permit neighbor-based explanation prediction. It presents a new objective function to learn an embedding to optimize -nearest neighbor (NN) search for both prediction accuracy as well as holistic human relevancy to enforce that returned neighbors present meaningful information.

3.1 Problem Description

Let denote the input-output space, with denoting the joint distribution over this space, where . Then typically, in supervised learning one wants to estimate .

In our setting, we have a triple that denotes the input space, output space, and explanation space, respectively. We then assume that we have a joint distribution over this space, where . In this setting we want to estimate . Thus, we not only want to predict the labels , but also the corresponding explanations for the specific and based on historical explanations given by human experts.

The space in most of these applications is quite different than and has similarities with in that it requires human judgment.

We provide methods to solve the above problem. Although, these methods could be used even when is human understandable, we envision the most impact for applications where it is not the case, such as the olfaction dataset described in Section 4.

3.2 Proposed Approaches

We propose several approaches to teach labels and explanations from the training data, and predict them for novel test data. We will describe the baseline regression and embedding approaches. The particular parameters and specific instantiations are provided in Section 4.

3.2.1 Regression Baseline for Predicting

To set the baseline, we trained a regression network on the datasets to predict from using the mean-squared error (MSE) loss. This cannot be used to infer for a novel .

3.2.2 Embeddings to Predict and

We propose to use the activations from the last fully connected hidden layer of the regression network as embeddings for . Given a novel , we obtain its -nearest neighbors from the training set, and use the corresponding and to obtain the predictions as weighted averages. The weights are determined using a Gaussian kernel on the distances of the novel to its neighbors in the training set. This approach is used with all our NN based prediction approaches.

3.2.3 Pairwise Loss for Improved Embeddings

We propose to improve upon the embeddings of from the regression network by explicitly ensuring that the neighbors are projected close to each other in the embedding space. Since our goal is to obtain improved predictions of and using the NN approach described above, our neighborhood is also defined based on and .

For a pair of data points with inputs , predictions , and explanations , we define the following pairwise loss functions for creating the embeddings:


Eqn. (1) defines the embedding loss based on similarity in the space. If and are close, the cosine distance between and will be minimized. If If and are far, the cosine similarity will be minimized, thus maximizing the cosine distance. We set , therefore when , the pairs for which is between and are not considered for the embedding. A similar loss function based on similarity in the space is defined in (2). We combine the losses using and similarities as


where denotes the scalar weight for the weighted combination. We set in our experiments.

4 Evaluation

To evaluate the ideas presented in this work, we focus on two fundamental questions:

  1. How is the prediction accuracy impacted by incorporating explanations into the training?

  2. Does the TED approach provide useful explanations?

Since the TED approach can be incorporated into many kinds of learning algorithms, tested against many datasets, and used in many different situations, a definitive answer to these questions is beyond the scope of this paper. Instead we try to address these two questions on 3 datasets, evaluating accuracy in the standard way. Since a consensus metric for explanation efficacy has yet to emerge [39], we suggest some measures in the experiments below. We expect several metrics of explanation efficacy to emerge, including those involving the target explanation consumers [6].

4.1 Datasets

The TED approach requires a training set that contains explanations. Since such datasets are not readily available, we evaluate the approach on a toy dataset (tic-tac-toe) and leverage 2 publicly available datasets in a unique way: AADB [10] and Olfactory [11, 12].

The tic-tac-toe dataset contains the 4,520 legal non-terminal positions in this classic game. Each position is labeled with a preferred next move () and an explanation of the preferred move (). Both and were generated by a simple set of rules given in Section 4.3.

The AADB (Aesthetics and Attributes Database) [10] contains about images that have been human rated for aesthetic quality (), where higher values imply more aesthetically pleasing. It also comes with 11 attributes () that are closely related to image aesthetic judgments by professional photographers. The attribute values are averaged over 5 humans and lie in . For the discrete metrics discussed in Section 4.2, we use bin thresholds of and for the ratings and and for the ratings, which roughly partitions the values into thirds based on the training data. The training, test, and validation partitions are provided by the authors and consist of 8,458, 1,000, and 500 images, respectively.

The Olfactory dataset [11, 12] is a challenge dataset describing various scents (chemical bondings and labels). Each of the 476 rows represents a molecule with chemoinformatic features () (angles between bonds, types of atoms, etc.). Similarly to AADB, each row also contains 21 human perceptions of the molecule, such as intensity, pleasantness, sour, musky, burnt. These are average values among 49 diverse individuals and lie in . We take to be the pleasantness perception and to be the remaining 19 perceptions except for intensity, since these 19 are known to be more fundamental semantic descriptors while pleasantness and intensity are holistic perceptions [12]. We choose bin thresholds of and to partition the scores in the training data into thirds. For , we use and as the element-wise bin thresholds to partition them into thirds. We use the standard training, test, and validation sets provided by the challenge organizers with , , and instances respectively.

4.2 Metrics

An open question that we do not attempt to resolve here is the precise form that explanations should take. It is important that they match the mental model of the explanation consumer. For example, one may expect explanations to be categorical (as in tic-tac-toe or loan approval reason codes) or discrete ordinal, as in human ratings. However, they also may be continuous in crowd sourced environments, where the final rating is an (weighted) average over the ratings of the humans in question. This is seen in the two real datasets that we consider, where one had 5 individuals while the other had 49 individuals.

Since we are using existing continuous-valued attributes as explanations, we choose to treat them both as is and discretized into bins, representing negative, neutral, and positive values. The latter mimics human ratings (e.g. not pleasing, neutral, or pleasing). Specifically, we train on the original continuous values and report absolute error (MAE) between and a continuous-valued prediction . We also similarly discretize as . We then report both absolute error in the discretized values (so that and ) as well as - error ( or ), where the latter corresponds to conventional classification accuracy.

The explanations are treated similarly by computing distances (sum of absolute differences over attributes) before and after discretizing to . We do not, however, compute the - error for .

4.3 Tic-Tac-Toe

As an illustration of the proposed approach, we describe a simple domain, tic-tac-toe, where it is possible to automatically provide labels (the preferred move in a given board position) and explanations (the reason why the preferred move is best). A tic-tac-toe board is represented by two binary feature planes, indicating the presence of X and O, respectively. An additional binary feature indicates the side to move, resulting in a total of 19 binary input features. Each legal board position is labeled with a preferred move, along with the reason the move is preferred. The labeling is based on a simple set of rules that are executed in order (note that the rules do not guarantee optimal play):

  1. If a winning move is available, completing three in a row for the side to move, choose that move with reason Win

  2. If a blocking move is available, preventing the opponent from completing three in a row on their next turn, choose that move with reason Block

  3. If a threatening move is available, creating two in a row with an empty third square in the row, choose that move with reason Threat

  4. Otherwise, choose an empty square, preferring center over corners over middles, with reason Empty

Input Y Accuracy Y and E Accuracy
Y 0.9653 NA
Y and E 0.9742 0.9631

Table 1: Accuracy of predicting Y, Y and E in tic-tac-toe

Two versions of the dataset were created, one with only the preferred move (represented as a plane), the second with the preferred move and explanation (represented as a stack of planes). A simple neural network classifier was built on each of these datasets, with one hidden layer of 200 units using ReLU and a softmax over the 9 (or 36) outputs. On a test set containing 10% of the legal positions, this classifier obtained an accuracy of 96.53% on the move-only prediction task, and 96.31% on the move/explanation prediction task (Table 1). When trained on the move/explanation task, performance on predicting just the preferred move actually increases to 97.42%. This illustrates that the overall approach works well in a simple domain with a limited number of explanations. Furthermore, given the simplicity of the domain, it is possible to provide explanations that are both useful and accurate.

4.4 Aadb

All experiments with the AADB dataset used a modified PyTorch implementation of AlexNet for fine-tuning [42]. We simplify the fully connected layers for the regression variant of AlexNet to 1024-ReLU-Dropout-64-1. The embedding layer provides dimensional outputs. This layer used a learning rate of 0.01, and all other layers used 0.001. A batch size of 64 was used, and the network with pairwise loss was trained for epochs. For this dataset, were defined as .

We use the four approaches proposed in Section 3.2 to obtain results; a simple regression baseline, NN using embeddings from the regression network; NN using embeddings optimized for pairwise loss using alone, and embeddings optimized using weighted pairwise loss with and . We optimized for pairwise loss using pairs chosen from the training data.

Performance on Y Performance on E
Algorithm Class. Accuracy Discretized Continuous Discretized Continuous
Baseline NA 0.4140 0.6250 0.1363 NA NA
Embedding + NN 1 0.3990 0.7650 0.1849 0.6237 0.2724
2 0.4020 0.7110 0.1620 0.5453 0.2402
5 0.3970 0.6610 0.1480 0.5015 0.2193
10 0.3890 0.6440 0.1395 0.4890 0.2099
15 0.3910 0.6400 0.1375 0.4849 0.2069
20 0.3760 0.6480 0.1372 0.4831 0.2056
Pairwise + NN 1 0.4970 0.5500 0.1275 0.6174 0.2626
2 0.4990 0.5460 0.1271 0.5410 0.2356
5 0.5040 0.5370 0.1254 0.4948 0.2154
10 0.5100 0.5310 0.1252 0.4820 0.2084
15 0.5060 0.5320 0.1248 0.4766 0.2053
20 0.5110 0.5290 0.1248 0.4740 0.2040
Pairwise & + NN 1 0.5120 0.5590 0.1408 0.6060 0.2617
2 0.5060 0.5490 0.1333 0.5363 0.2364
5 0.5110 0.5280 0.1272 0.4907 0.2169
10 0.5260 0.5180 0.1246 0.4784 0.2091
15 0.5220 0.5220 0.1240 0.4760 0.2065
20 0.5210 0.5220 0.1235 0.4731 0.2050
Table 2: Accuracy of predicting and for AADB using different methods (Section 3.2). Baseline is regression network to predict from . Embedding + NN uses the embedding from the last hidden layer of the regression network. Pairwise + NN uses the cosine embedding loss in (1) to optimize the embeddings of . Pairwise & + NN uses the sum of cosine embedding losses in (3) to optimize the embeddings of X.

Table 2 provides accuracy numbers for and using the proposed approaches with various values of . Numbers in bold are the best for a metric among an algorithm. The results show improved accuracy over the baseline for Pairwise + NN and Pairwise & + NN and corresponding improvement for MAE for . Clearly, optimizing embeddings based on , and & similarities is better for predicting compared to not doing so. The higher improvement in performance using & similarities can be explained the fact that can be predicted easily using in this dataset. Using a simple regression model, this predictive accuracy was with MAE of and for Discretized and Continuous, respectively.

The accuracy of varies among the three NN techniques with slight improvements by using pairwise and then pairwise & .

4.5 Olfactory

Since random forest was the winning entry on this dataset [12], we used a random forest regression to pre-select out of features for subsequent modeling. From these features, we used a fully connected layer of 64 units (embedding layer), which were fully connected to an output layer, for the regression baseline. No non-linearities were employed, but the data was first transformed using and then the features were standardized to zero mean and unit variance. Batch size was 338, and the network with pairwise loss was run for epochs with a learning rate of . For this dataset, we set to .

Performance on Y Performance on E
Algorithm K Class. Accuracy Discretized Continuous Discretized Continuous

Baseline LASSO
NA 0.4928 0.5072 8.6483 NA NA
Baseline RF NA 0.5217 0.4783 8.9447 NA NA

Embedding + NN
1 0.5362 0.5362 11.7542 0.5690 4.2050
2 0.5362 0.4928 9.9780 0.4950 3.6555
5 0.6087 0.4058 9.2840 0.4516 3.3488
10 0.5652 0.4783 10.1398 0.4622 3.4128
15 0.5362 0.4928 10.4433 0.4798 3.4012
20 0.4783 0.5652 10.9867 0.4813 3.4746

Pairwise + NN
1 0.6087 0.4783 10.9306 0.5515 4.3547
2 0.5362 0.5072 10.9274 0.5095 3.9330
5 0.5507 0.4638 10.4720 0.4935 3.6824
10 0.5072 0.5072 10.7297 0.4912 3.5969
15 0.5217 0.4928 10.6659 0.4889 3.6277
20 0.4638 0.5507 10.5957 0.4889 3.6576

Pairwise & + NN
1 0.6522 0.3913 10.4714 0.5431 4.0833
2 0.5362 0.4783 10.0081 0.4882 3.6610
5 0.5652 0.4638 10.0519 0.4622 3.4735
10 0.5072 0.5217 10.3872 0.4653 3.4786
15 0.5072 0.5217 10.7218 0.4737 3.4955
20 0.4493 0.5797 10.8590 0.4790 3.5027

Table 3: Accuracy of predicting and for Olfactory using different methods. We train baseline LASSO and RF models to predict from . We use the last hidden layer of the regression network to obtain embeddings for the NN Embedding method. NN Pairwise uses the cosine embedding loss in (1) to optimize the embeddings of . NN Pairwise & uses the sum of cosine embedding losses in (3) to optimize the embeddings of .

Table 3 provides accuracy numbers in the same format as Table 2. For this dataset, we have 2 baselines for predicting from : LASSO [43] and Random Forest (RF) [44].

The results show, once again, improved accuracy over the baseline for Pairwise + NN and Pairwise & + NN and corresponding improvement for MAE for . Again, this performance improvement can be explained by the fact that the predictive accuracy of given using the both baselines were , with MAEs of and ( for RF) for Discretized and Continuous, respectively. Once again, the accuracy of varies among the 3 KNN techniques with no clear advantages.

5 Discussion

One potential concern with the TED approach is the additional labor required for adding explanations. However, researchers [23, 24, 25, 26] have quantified that the time to add labels and explanations is often the same as just adding labels for an expert SME. They also cite other benefits of adding explanations, such as improved quality and consistency of the resulting training data set.

Furthermore, in some instances, the NN instantiation of TED may require no extra labor. For example, in cases where embeddings are used as search criteria for evidence-based predictions of queries, end users will, on average, naturally interact with search results that are similar to the query in explanation space. This query-result interaction activity inherently provides similar and dissimilar pairs in the explanation space that can be used to refine an embedding initially optimized for the predictions alone. This reliance on relative distances in explanation space is also what distinguishes this method from multi-task learning objectives, since absolute labels in explanation space need not be defined.

6 Conclusion

The societal demand for “meaningful information” of automated decisions has sparked significant research in AI explanability. This paper suggests a new paradigm for providing explanations from machine learning algorithms. This new approach is particularly well-suited for explaining a machine learning prediction when all of its input features are inherently incomprehensible to humans, even to deep subject matter experts. The approach augments training data collection beyond features and labels to also include elicited explanations. Through this simple idea, we are not only able to provide useful explanations that would not have otherwise been possible, but we are able to tailor the explanations to the intended user population by eliciting training explanations from members of that group.

There are many possible instantiations for this proposed paradigm of teaching explanations. We have described a novel instantiation that learns feature embeddings using labels and explanation similarities in a joint and aligned way to permit neighbor-based explanation prediction. We present a new objective function to learn an embedding to optimize nearest neighbor search for both prediction accuracy as well as holistic human relevancy to enforce that returned neighbors present meaningful information. We have demonstrated the proposed paradigm and two of its instantiations on a tic-tac-toe dataset that we created, a publicly-available image aesthetics dataset [10], and a publicly-available olfactory pleasantness dataset [11, 12].

We hope this work will inspire other researchers to follow this paradigm.


  • [1] B. Goodman and S. Flaxman, “EU regulations on algorithmic decision-making and a ‘right to explanation’,” in Proc. ICML Workshop Human Interp. Mach. Learn., New York, NY, Jun. 2016, pp. 26–30.
  • [2] S. Wachter, B. Mittelstadt, and L. Floridi, “Why a right to explanation of automated decision-making does not exist in the general data protection regulation,” Int. Data Privacy Law, vol. 7, no. 2, pp. 76–99, May 2017.
  • [3] A. D. Selbst and J. Powles, “Meaningful information and the right to explanation,” Int. Data Privacy Law, vol. 7, no. 4, pp. 233–242, Nov. 2017.
  • [4] “Guidelines on automated individual decision-making and profiling for the purposes of regulation 2016/679,” Article 29 Data Protection Working Party, European Union, Tech. Rep., Oct. 2017. [Online]. Available: http://ec.europa.eu/justice/data-protection/index_en.htm
  • [5] T. Kulesza, S. Stumpf, M. Burnett, S. Yang, I. Kwan, and W.-K. Wong, “Too much, too little, or just right? Ways explanations impact end users’ mental models,” in Proc. IEEE Symp. Vis. Lang. Human-Centric Comput., San Jose, CA, Sep. 2013, pp. 3–10.
  • [6] A. Dhurandhar, V. Iyengar, R. Luss, and K. Shanmugam, “A formal framework to characterize interpretability of procedures,” in Proc. ICML Workshop Human Interp. Mach. Learn., Sydney, Australia, Aug. 2017, pp. 1–7.
  • [7] T. Miller, P. Howe, and L. Sonenberg, “Explainable AI: Beware of inmates running the asylum or: How I learnt to stop worrying and love the social and behavioural sciences,” in Proc. IJCAI Workshop Explainable Artif. Intell., Melbourne, Australia, Aug. 2017.
  • [8] T. Lombrozo, “Simplicity and probability in causal explanation,” Cognitive Psychol., vol. 55, no. 3, pp. 232–257, Nov. 2007.
  • [9] T. Miller, “Explanation in artificial intelligence: Insights from the social sciences,” arXiv preprint arXiv:1706.07269, Jun. 2017.
  • [10] S. Kong, X. Shen, Z. Lin, R. Mech, and C. Fowlkes, “Photo aesthetics ranking network with attributes and content adaptation,” in Proc. Eur. Conf. Comput. Vis., Amsterdam, Netherlands, Oct. 2016, pp. 662–679.
  • [11] “Dream olfaction prediction challenge,” 2015. [Online]. Available: https://www.synapse.org/#!Synapse:syn2811262/wiki/78368
  • [12] A. Keller, R. C. Gerkin, Y. Guan, A. Dhurandhar, G. Turu, B. Szalai, J. D. Mainland, Y. Ihara, C. W. Yu, R. Wolfinger, C. Vens, L. Schietgat, K. De Grave, R. Norel, G. Stolovitzky, G. A. Cecchi, L. B. Vosshall, and P. Meyer, “Predicting human olfactory perception from chemical features of odor molecules,” Science, vol. 355, no. 6327, pp. 820–826, 2017.
  • [13] M. T. Ribeiro, S. Singh, and C. Guestrin, ““Why should I trust you?”: Explaining the predictions of any classifier,” in Proc. ACM SIGKDD Int. Conf. Knowl. Disc. Data Min., San Francisco, CA, Aug. 2016, pp. 1135–1144.
  • [14] D. Alvarez-Melis and T. S. Jaakkola, “A causal framework for explaining the predictions of black-box sequence-to-sequence models,” in Proc. Conf. Emp. Methods Nat. Lang. Process., Copenhagen, Denmark, Sep. 2017, pp. 412–421.
  • [15] P. W. Koh and P. Liang, “Understanding black-box predictions via influence functions,” in Proc. Int. Conf. Mach. Learn., Sydney, Australia, Aug. 2017, pp. 1885–1894.
  • [16] R. Caruana, Y. Lou, J. Gehrke, P. Koch, M. Sturm, and N. Elhadad, “Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission,” in Proc. ACM SIGKDD Int. Conf. Knowl. Disc. Data Min., Sydney, Australia, Aug. 2015, pp. 1721–1730.
  • [17] B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, “Learning deep features for discriminative localization,” in Computer Vision and Pattern Recognition (CVPR), 2016.
  • [18] D. Bau, B. Zhou, A. Khosla, A. Oliva, and A. Torralba, “Network dissection: Quantifying interpretability of deep visual representations,” in Computer Vision and Pattern Recognition (CVPR), 2017.
  • [19] J. Yosinki, J. Clune, A. Nguyen, T. Fuchs, and H. Lipson, “Understanding neural networks through deep visualization,” in Deep Learning Workshop of International Conference on Machine Learning (ICML), 2015.
  • [20] S. Lundberg and S.-I. Lee, “A unified approach to interpreting model predictions,” in Advances of Neural Inf. Proc. Systems, 2017.
  • [21] O. Bastani, C. Kim, and H. Bastani, “Interpreting blackbox models via model extraction,” arXiv preprint arXiv:1705.08504, 2018.
  • [22] Q. Sun and G. DeJong, “Explanation-augmented svm: an approach to incorporating domain knowledge into svm learning,” in 22nd International Conference on Machine Learning, 2005.
  • [23] O. F. Zaidan and J. Eisner, “Using ’annotator rationales’ to improve machine learning for text categorization,” in In NAACL-HLT, 2007, pp. 260–267.
  • [24] ——, “Modeling annotators: A generative approach to learning from annotator rationales,” in Proceedings of EMNLP 2008, October 2008, pp. 31–40.
  • [25] Y. Zhang, I. J. Marshall, and B. C. Wallace, “Rationale-augmented convolutional neural networks for text classification,” in Conference on Empirical Methods in Natural Language Processing (EMNLP), 2016.
  • [26] T. McDonnell, M. Lease, M. Kutlu, and T. Elsayed, “Why is that relevant? collecting annotator rationales for relevance judgments,” in Proc. AAAI Conf. Human Comput. Crowdsourc., 2016.
  • [27] J. Donahue and K. Grauman, “Annotator rationales for visual recognition,” in ICCV, 2011.
  • [28] K. Duan, D. Parikh, D. Crandall, and K. Grauman, “Discovering localized attributes for fine-grained recognition,” in CVPR, 2012.
  • [29] P. Peng, Y. Tian, T. Xiang, Y. Wang, and T. Huang, “Joint learning of semantic and latent attributes,” in ECCV 2016, Lecture Notes in Computer Science, vol. 9908, 2016.
  • [30] T. Lei, R. Barzilay, and T. Jaakkola, “Rationalizing neural predictions,” in EMNLP, 2016.
  • [31] Y. Ainur, Y. Choi, and C. Cardie, “Automatically generating annotator rationales to improve sentiment classification,” in Proceedings of the ACL 2010 Conference Short Papers, 2010, pp. 336–341.
  • [32] L. A. Hendricks, Z. Akata, M. Rohrbach, J. Donahue, B. Schiele, and T. Darrell, “Generating visual explanations,” in CVPR, 2016.
  • [33] J. Wan, D. Wang, S. Hoi, P. Wu, J. Zhu, Y. Zhang, and J. Li, “Deep learning for content-based image retrieval: A comprehensive study,” in Proceedings of the ACM International Conference on Multimedia, 2014.
  • [34] O. Jimenez-del-Toro, A. Hanbury, G. Langs, A. Foncubierta–Rodriguez, and H. Muller, “Overview of the visceral retrieval benchmark 2015,” in Multimodal Retrieval in the Medical Domain (MRMD) Workshop, in the 37th European Conference on Information Retrieval (ECIR), 2015.
  • [35] Z. Li, X. Zhang, H. Muller, and S. Zhang, “Large-scale retrieval for medical image analytics: A comprehensive review,” in Medical Image Analysis, vol. 43, pp. 66-84, 2018.
  • [36] Y. A. Chung and W. H. Weng, “Learning deep representations of medical images using siamese cnns with application to content-based image retrieval,” in NIPS Workshop on Machine Learning for Health (ML4H), 2017.
  • [37] J. Sun, F. Wang, J. Hu, and S. Edabollahi, “Supervised patient similarity measure of heterogeneous patient records,” in SIGKDD Explorations, 2012.
  • [38] L. A. Hendricks, Z. Akata, M. Rohrbach, J. Donahue, B. Schiele, and T. Darrell, “Generating visual explanations,” in European Conference on Computer Vision, 2016.
  • [39] F. Doshi-Velez, M. Kortz, R. Budish, C. Bavitz, S. Gershman, D. O’Brien, S. Schieber, J. Waldo, D. Weinberger, and A. Wood, “Accountability of AI under the law: The role of explanation,” CoRR, vol. abs/1711.01134, 2017. [Online]. Available: http://arxiv.org/abs/1711.01134
  • [40] O. Biran and C. Cotton, “Explanation and justification in machine learning: A survey,” in IJCAI-17 Workshop on Explainable AI (XAI), 2017.
  • [41] F. Doshi-Velez and B. Kim, “Towards a rigorous science of interpretable machine learning,” in https://arxiv.org/abs/1702.08608v2, 2017.
  • [42] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Adv. Neur. Inf. Process. Syst. 25, 2012, pp. 1097–1105.
  • [43] R. Tibshirani, “Regression shrinkage and selection via the lasso,” Journal of the Royal Statistical Society. Series B (Methodological), pp. 267–288, 1996.
  • [44] L. Breiman, “Random forests,” Machine Learning, vol. 45, no. 1, pp. 5–32, 2001.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description