# A Theory of Diagnostic Interpretation in Supervised Classification

###### Abstract

Interpretable deep learning is a fundamental building block towards safer AI, especially when the deployment possibilities of deep learning-based computer-aided medical diagnostic systems are so eminent ^{1}.
However, without a computational formulation of black-box interpretation, general interpretability research rely heavily on subjective bias.
Clear decision structure of the medical diagnostics lets us approximate the decision process of a radiologist as a model - removed from subjective bias.
We define the process of interpretation as a finite communication between a known model and a black-box model to optimally map the black box’s decision process in the known model.
Consequently, we define interpretability as maximal information gain over the initial uncertainty about the black-box’s decision within finite communication.
We relax this definition based on the observation that diagnostic interpretation is typically achieved by a process of minimal querying.
We derive an algorithm to calculate diagnostic interpretability.
The usual question of accuracy-interpretability tradeoff, i.e. whether a black-box model’s prediction accuracy is dependent on its ability to be interpreted by a known source model, does not arise in this theory.
With multiple example simulation experiments of various complexity levels, we demonstrate the working of such a theoretical model in synthetic supervised classification scenarios.

###### Keywords:

interpretable learning, deep learning interpretation, diagnostic interpretability, black-box interpretation, interpretability## 1 Introduction

Reliable and accurate Computer-Aided medical Diagnostic (CAD) system is a long overdue - unprecedented accuracy achieved by Deep Learning techniques in disease diagnosis (compared to previous attempts of simplistic learning) is certainly inspiring in this matter.
Typical deep learning based CAD systems treat diagnosis as a supervised classification problem where expert annotated data is used for training the model and accuracy is measured by the deep learning model’s performance on previously unseen examples.
In fact, strong diagnostic accuracy of Deep learning is recently demonstrated in multiple areas.
However, in terms of classification, unpredictable errors made by Deep Learning under minor modification of input ^{7} has already been identified.
In medicine, where trust-ability and traceability are gold standards for acceptance, whether some form of deep-learning will succeed as a go-to clinical CAD system in foreseeable future, depends mainly on solving interpretability.
To date, however, little work has thoroughly examined the Diagnostic Interpretability of deep learning in CAD.

Interpretability is a relatively new and unexplored field of research especially in the deep learning context.
Lipton ^{4} has written an article on the desiderata of interpretability.
Recent trends of interpretability research has also been summarized by Velez and Kim ^{2} with a view about the prospects of making the general interpretability research a scientific discipline.
We are mainly interested in the special case of diagnostic interpretability - particularly for CAD in supervised learning setting.
Most of the dominant literature in diagnostic interpretability either presents visual interpretation of the deep network’s decision process to radiologists, in form of some heat maps ^{3} or embeds textual description of the decision process as a proxy of deep network’s decision ^{8}.
The fundamental problem of such interpretability description is that these models implicitly assume neural network’s understanding of abstraction levels similar to that of a radiologist.
Unlike radiologists, deep networks have no understanding of abstraction levels.
We assume, similar to computer vision, the network is getting higher activations at distinguishing textures without radiological context ^{6} in heat maps and learning association rules without context in joint embedding.

From the interpretation point of view, upper levels of abstraction significantly reduce the uncertainty compared to the lower levels. Let us take an example where a radiologist is explaining her diagnosis about breast cancer to another radiologist. She can point out to a particular irregular calcification at a certain location represented by a set of pixels in the radiograph with the implicit assumption that her colleague uses similar abstractions to understand the anatomy - even if the colleague is trained at a different continent. Her decision at the pixel level is interpretable (even if it does not agree to her colleague) because of one-to-one mappings at upper abstraction levels with little uncertainty. Both of them agreed upon in a hierarchical fashion that the radiograph contains the image of a breast as well as its location, orientation, delineation and sub-parts. These higher-level interpretations significantly reduce their uncertainty about the final decision, while focusing at the set of calcification pixels. Such a communication would be impossible with an alien radiologist who does not understand human anatomy in similar levels of abstraction - the uncertainty at the pixel level is simply too massive. This exact scenario happens with a deep network. Taking the heat map example, the deep network would fail to generate heat maps at all abstraction levels (e.g. anatomy, its subparts) unless explicitly trained to do so.

We argue that restricting ourselves into diagnostic interpretation has one major advantage over the general purpose interpretation - namely the diagnostic process has well defined levels of abstraction. This means a diagnostic model can have a complete mathematical definition removed from subjective biases. We theorize Diagnostic Interpretation as the communication between two models, where one model is querying and adjusting its own decision process at all levels of abstraction to maximally emulate the decision process of the other model. One immediate benefit of this theory is that the usual question of accuracy-interpretability tradeoff does not arise here. In fact, within this theory, any “bad model” with low prediction accuracy can interpret another complex and superior model – as long as the models share levels of abstractions. A real-world example of this is the process when a trainee radiologist queries an expert and adjusts his model by emulating the expert’s.

Though the community has been broadly curious about defining the terms such as interpretability, explainability in semi-formal ways, formally defining the process of interpretation and diagnostic interpretation is often overlooked. In section 2, we formalize complete interpretation, interpretability, diagnostic interpretation, confidence on interpretation and -interpretation. We derive an algorithm in section 3 and show an experimental evaluation in section 4. Section 5 discusses the possible impacts of the theory introduced here and concludes with some future direction.

## 2 Theory Formulation

Let us assume two models and with overall representations and . These models take similar images as input and predicts diagnosis label () as output i.e. and . Note that this description is devoid of ground truth, the prediction accuracy of each model for unseen examples is irrelevant for this theory.

We propose a strong assumption about interpretation process. and can only communicate for interpretation if both of them contains exactly same levels of abstractions i.e. if and are representation of and at different abstraction levels, then these levels must have a one-to-one relationship. At each level , predicts as follows: . follows the same suite.

This assumption is derived based on the observation discussed in section 1. This observation also suggests significant lowering of uncertainty at upper levels of abstraction, though the exact relationship is unknown. Since most of the complex modern learning models (e.g. CNN, random forest) fall flat in this category compared to the diagnostic models of radiologists, we propose directions to relax this assumption in the definition of -interpretation.

###### Definition 1

Complete interpretation: A complete interpretation is the process of communication through exhaustive querying by the known model to minimize it’s uncertainty about the decision boundary of the target black-box model .

The exhaustive querying ensures no further information gain even if querying continues indefinitely longer.

###### Definition 2

Interpretability: Interpretability () is the ratio between information gain about target model ’s decision boundary through complete interpretation and the initial uncertainty about ’s decision boundary.

More formally,

(1) |

Here, represents the entropy about ’s decision boundary (across all ), before the process of interpretation starts. , on the other hand, represents the entropy (across all ) after complete interpretation. Assuming each abstraction level is independent of the others, overall entropy across all levels of abstraction can be calculated as the sum of individual entropies at each abstraction level i.e.

(2) |

where is the entropy at abstraction level . Based on how approaches the two extremes, there are two boundary cases: either (a) model would be interpreted completely when which in turn means or (b) it would be impossible to interpret if which in turn means .

Toy Example: Let’s consider a simple example of square binary images of size . The possible set of all such images has the massive cardinality . Both model and are represented at a single scale , with decision mechanism of similar complexity as shown in figure 1(b).

For such an example, it is possible to do a Complete interpretation. Let’s compute the two terms in RHS of equation 1 separately for this example. If we initialize with , then the interpretation process looks like a symmetric binary channel, where model and does not conclude at the same prediction with probability , as shown in figure 1(a). This means the presence of an initial entropy that can be calculated as following: .

Since, an exhaustive search in this case ensures lossless representation of in , , which means . Now, let’s consider a less complex decision mechanism that can only approximate the decision process of (figure 1(c)). In such scenario, if initial entropy is and final entropy is , the interpretability i.e. . This means, the decision boundary of can be interpreted with certainty w.r.t. . Note that, though same level of abstraction ensures complete interpretation, the complexity of the model means might or might not be 1.

Practical Limitations of Complete interpretation: The problem of definition 1 is the potentially massive cardinality of . If and are defined in the image space, the massive uncertainty on average about ’s decision render complete interpretation impractical since the process requires querying through and reasoning about all the possible images, resulting in an expensive optimization problem.

We relax the definition 1 into a practical process of diagnostic interpretation by exploiting two intuitions.
The first intuition is that even though is massive, the image manifold where realistic images lie and radiological decisions are made is much lower dimensional.
The second intuition comes from social science observations.
These observations suggest that humans only need a few example cases for reasonable interpretation ^{5}.

Equipped with these two intuitions, we relax definition 1 into realistic definition of diagnostic interpretation by minimal querying on image manifold for maximal information gain.

###### Definition 3

Diagnostic interpretation: Diagnostic interpretation is the process of querying minimal number of times by to maximize the interpretability () of .

More formally,

(3) |

Parameter balances between interpretability and number of queries whereas represents the cardinality of the subset of images from which query images are sampled. for each can be calculated as:

(4) |

The entropy for each sample across all the levels of abstraction can be calculated as:

(5) |

Based on the values of entropy, three extremal cases are possible: (a) after -queries, model would be interpreted completely when which in turn would lead to , (b) between queries, entropy would not change even after model update i.e. which means and finally (c) initial entropy is 0 i.e. which means model and are exactly same from an information perspective, and already interpretability is 1.

Toy Example: Let’s consider the simple example of square binary images of size introduced earlier. However, the underlying image manifold this time is the set of diagonal images and the decision problem is to classify the two diagonals, as shown in figure 2(b). Two models and for this classification decision is shown in figure 2(a). In this example, though only two ’real images’ are possible, the image space has possibilities. We can marginalize this massive image space in multiple ways by exploiting the first intuition. We have considered an intuitive relaxed envelop of single pixel flips to represent noise. There are 32 (=16 2) such cases, resulting in a total of 34 (= 2+32) possible images.

Of these 34 images, there are only 4 images where disagrees with i.e. . Plugging in entropy equation, we get . We consider all those 4 cases and update the rules of A, to get final and accordingly. So, interpretability .

This view of diagnostic interpretation suggests that accuracy-interpretability tradeoff is not real i.e. it is unnecessary to simplify a model for interpretation by trading accuracy. In fact, a model of any complexity can be diagnostically interpretable as long as it respects the levels of abstraction of the source model.

Limitations of Diagnostic interpretation: In a typical CAD scenario where known model (e.g. decision process of radiologist) tries to interpret black-box model (e.g. DNN or kernelized learning), the strong assumption regarding one-to-one mapping of abstraction levels behind the definition 3 does not hold anymore. Without levels of abstraction assumption, the necessary way to ensure Diagnostic interpretation is the exhaustive search of - the absence of which should reduce confidence about the quality of interpretation. As the first step toward relaxation of the strong assumption, we introduce Confidence on Interpretation () in the following.

###### Definition 4

Confidence on Interpretation: The Confidence on Interpretation () is the ratio of cardinality of the subset from which images are sampled for interpretation and the cardinality of the set of all possible images that the models might encounter.

Confidence on Interpretation measures typicality of a sample on which the interpretation is performed i.e. how likely the models might encounter such a sample. Without any explicit assumption about the image manifold, lower bound of Confidence on Interpretation . Note that, in real world situation where the image space is massive, neglecting the Confidence on Interpretation might lead to severe consequences, for example wrong calculation of interpretabiliy, abundance of unexplainable examples etc.

Equipped with Confidence on Interpretation, we have a relaxed and practical definition of interpretability in deep learning context.

###### Definition 5

-interpretation: -interpretation is the process of querying minimal number of times by the model A to maximize the -interpretability (). More formally,

(6) |

That is in the absence of correspondence in levels of abstractions, after iterations of the -interpretation process, we are at most confident about the interpretability of model by model . -interpretability per sample () can be calculated by the formulation derived in equation 5 with .

## 3 Diagnostic Interpretation Algorithm

This algorithm can calculate either diagnostic interpretability, or under minor modification, -interpretability.

Here, is the update rule of model at each level of abstraction. Note that both sampling and update rule is not specified in algorithm 1 - these are free parameters that can be chosen based on problem assumptions and .

## 4 Evaluation

We evaluate algorithm 1 for calculating of a deep neural network () with respect to a linear SVM ().

Dataset: We consider a simple problem - binary classification of binary images. The images with left square smaller than the right are assigned label and the opposite images are assigned (see Figure 3(a)). We also consider all the images with 1 pixel flipped as envelop, belonging to the corresponding class (Figure 3(a))

Interpretation: We trained a simple convolutional neural network (model ) of the following architecture: C32-C32-F32 with convolutional kernels having ReLu activations followed by MaxPooling layers, using binary crossentropy loss. We trained a simple liner SVM as model to interpret model . 200 randomly sampled images (100 from each group) are used for training, while -interpretation was performed on the dataset of images i.e. all normal images and 1-pixel flip envelop. Based on algorithm 1, we consider random sampling of images where model and disagrees. As a simple update rule, we created an intermediate training dataset by concatenating the sampled image along with its label as predicted by CNN - linear SVM model was re-trained on this intermediate dataset.

For this simple dataset, we trained CNNs 10 different times and accumulated the average result in figure 3(c). By the second iteration, all 10 test runs achieved . However, note how flimsy the confidence of interpretation lower bound () is for this problem, even after considering the 1-pixel flip envelop.

## 5 Discussion

The main goal of this paper is to start a discourse on the possibility of quantitative analysis and understanding of diagnostic interpretability. Looking at interpretation under the prism of communication between two models, we propose some basic definitions and an algorithm to emulate the process of interpretation for calculating interpretability. Neither the definitions nor the algorithm is complete - in fact the algorithm might need to be improved and the definitions need to be revisited in future. This theory predicts that the classical idea of interpretability-accuracy tradeoff is true in a limited sense In fact the real bottleneck might be the maximum achievable unique decodability of w.r.t. ’s levels of abstractions. We can safely assume that in future, complex yet highly interpretable models can be designed, as long as such models respect ’s levels of abstractions.

From a human computer interaction perspective, it would be an interesting future direction to study whether the theoretically calculated interpretability has any correlation to radiologist’s perception. It would also be interesting to study how many images, on average, a radiologist look at before trusting the black-box model. Finally, for some form of deep learning to be acceptable as a clinical CAD system (either sime- or fully-automatic), gaining trust of the radiologist is essential. Looking away from subjective bias might be a good first step toward gaining that trust.

## 6 Acknowledgement

Special thanks to David Kügler for vigorous discussions and cleaning up of initial formulation. A thanks also goes to Johannes Fauser for an initial discussion. Thanks to Arjan Kuijper and Salome Kazeminia for proof reading, commenting and checking the formulations.

## References

- 1 U.s. fda approves ai device to detect diabetic eye disease, https://www.reuters.com/article/us-fda-ai-approval/u-s-fda-approves-ai-device-to-detect-diabetic-eye-disease-idUSKBN1HI2LC
- 2 Doshi-Velez, F., Kim, B.: Towards a rigorous science of interpretable machine learning (2017)
- 3 Garcia-Peraza-Herrera, L.C., Everson, M., Li, W., Luengo, I., Berger, L., Ahmad, O., Lovat, L., Wang, H.P., Wang, W.L., Haidry, R., et al.: Interpretable fully convolutional classification of intrapapillary capillary loops for real-time detection of early squamous neoplasia. arXiv preprint arXiv:1805.00632 (2018)
- 4 Lipton, Z.C.: The mythos of model interpretability. arXiv preprint arXiv:1606.03490 (2016)
- 5 Miller, T.: Explanation in artificial intelligence: Insights from the social sciences. arXiv preprint arXiv:1706.07269 (2017)
- 6 Nguyen, A., Yosinski, J., Clune, J.: Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 427–436 (2015)
- 7 Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., Fergus, R.: Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013)
- 8 Wang, X., Peng, Y., Lu, L., Lu, Z., Summers, R.M.: Tienet: Text-image embedding network for common thorax disease classification and reporting in chest x-rays. arXiv preprint arXiv:1801.04334 (2018)