[
Abstract
Information theory can be used to analyze the costbenefit of visualization processes. However, the current measure of benefit contains an unbounded term that is neither easy to estimate nor intuitive to interpret. In this work, we propose to revise the existing costbenefit measure by replacing the unbounded term with a bounded one. We examine a number of bounded measures that include the JensonShannon divergence and a new divergence measure formulated as part of this work. We use visual analysis to support the multicriteria comparison, enabling the selection of the most logical and intuitive option. We applied the revised costbenefit measure to two case studies, demonstrating its uses in practical scenarios, while the collected real world data further informs the selection of a bounded measure.
A Bounded Measure for Estimating the Benefit of Visualization]A Bounded Measure for Estimating the Benefit of Visualization
M. Chen et al.]
Min Chen\orcid0000000153205729,
Mateu Sbert\orcid0000000321646858,
Alfie AbdulRahman, and
Deborah Silver
University of Oxford, UK,
University of Girona, Spain,
King’s College London, UK, and
Rutgers University, USA
1 Introduction
It is now widely understood among visualization researchers and practitioners that the effectiveness of a visualization process depends on data, user, and task. One important aspect of user is a user’s knowledge, which plays a critical role in reconstructing the information lost during visualization processes (e.g., data transformation and visual mapping). One major challenge in appreciating the significance of such knowledge is the difficulty to measure or estimate the knowledge used by a user during visualization.
Chen and Golan proposed an informationtheoretic measure [CG16] for measuring the costbenefit of a data intelligence process. The measure features a term based on the KullbackLeibler (KL) divergence [KL51] for measuring the potential distortion of a user in reconstructing the information that may have been lost or distorted during a visualization process. The costbenefit ratio instigates that a user with more knowledge about the source data and its visual representation is likely to suffer less distortion. While using KLdivergence is mathematically intrinsic for measuring the potential distortion, its unboundedness property has some undesirable consequences. Kijmongkolchai et al. applied the formula of Chen and Golan to the results of an empirical study for estimating users’ knowledge used in visualization processes, and used a bounded approximation of the KLdivergence in their estimation [KARC17].
In this work, we propose to replace the KLdivergence with a bounded term. We first confirm the boundedness is a necessary property. We then use visual analysis to compare a number of bounded measures, which include the JensenâShannon (JS) divergence [Lin91] and a new divergence measure, , formulated as part of this work. Based on our multicriteria analysis, we narrow down our selections to three most logical and intuitive options. We then apply the selected divergence measures, in conjunction with the revised costbenefit measure, to the real world data collected in two case studies. The numerical calculation in the application further informs us about the relative merits of the selected measure, which enables us to the final selection while demonstrating its uses in practical scenarios.
2 Related Work
Claude Shannon’s landmark article in 1948 [Sha48] signifies the birth of information theory. It has been underpinning the fields of data communication, compression, and encryption since. As a mathematical framework, information theory provides a collection of useful measures, many of which, such as Shannon entropy [Sha48], cross entropy [CT06], mutual information [CT06], and KullbackLeibler divergence [KL51] are widely used in applications such as physics, biology, neurology, psychology, and computer science (e.g., visualization, computer graphics, computer vision, data mining, and machine learning). In this work, we will also consider JensenShannon divergence [Lin91] in detail.
Information theory has been used extensively in visualization [CFV16]. The theory has enabled many applications in visualization, including scene and shape complexity analysis by Feixas et al. [FdBS99] and Rigau et al. [RFS05], light source placement by Gumhold [Gum02], view selection in mesh rendering by Vázquez et al. [VFSH04] and Feixas et al. [FSG09], attribute selection by Ng and Martin [NM04], view selection in volume rendering by Bordoloi and Shen [BS05], and Takahashi and Takeshima [TT05], multiresolution volume visualization by Wang and Shen [WS05], focus of attention in volume rendering by Viola et al. [VFSG06], feature highlighting by Jänicke and Scheuermann [JWSK07, JS10], and Wang et al. [WYM08], transfer function design by Bruckner and Möller [BM10], and Ruiz et al. [RBB11, BRB13b], multimodal data fusion by Bramon et al. [BBB12], isosurface evaluation by Wei et al. [WLS13], measuring of observation capacity by Bramon et al. [BRB13a], measuring information content by Biswas et al. [BDSW13], proving the correctness of “overview first, zoom, detailsondemand” by Chen and Jänicke [CJ10] and Chen et al. [CFV16], confirming visual multiplexing by Chen et al. [CWB14].
Ward first suggested that information theory might be an underpinning theory for visualization [PAJKW08]. Chen and Jänicke [CJ10] outlined an informationtheoretic framework for visualization, and it was further enriched by Xu et al. [XLS10] and Wang and Shen [WS11] in the context of scientific visualization. Chen and Golan proposed an informationtheoretic measure for analyzing the costbenefit of visualization processes and visual analytics workflows [CG16]. It was used to frame an observation study showing that human developers usually entered a huge amount of knowledge into a machine learning model [TKC17]. It motivated an empirical study confirming that knowledge could be detected and measured quantitatively via controlled experiments [KARC17]. It was used to analyze the costbenefit of different virtual reality applications [CGJM19]. It formed the basis of a systematic methodology for improving the costbenefit of visual analytics workflows [CE19]. This work continues the path of theoretical developments in visualization [CGJ17], and is intended to improve the original costbenefit formula [CG16], in order to make it more intuitive in practical applications.
3 Overview, Motivation, and Problem Statement
Visualization is useful in most data intelligence workflows, but it is not universally true because the effectiveness of visualization is usually data, user, and taskdependent. The costbenefit ratio proposed by Chen and Golan [CG16] captures some essence of such dependency. Below is the qualitative expression of the measure:
(1) 
Consider the scenario of viewing some data through a particular visual representation. The term Alphabet Compression (AC) measures the amount of information loss due to visual abstraction [VCI20]. Since the visual representation is fixed in the scenario, AC is thus largely datadependent. AC is a positive measure reflecting the fact that visual abstraction must be useful in many cases though it may result in information loss. This apparently counterintuitive term is essential for asserting why visualization is useful. (Note that the term also helps assert the usefulness of statistics, algorithms, and interaction since they all usually cause information loss [CE19].)
The positive implication of the term AC is counterbalanced by the term Potential Distortion, while both being moderated by the term Cost. The term Cost encompasses all costs of the visualization process, including computational costs (e.g., visual mapping and rendering), cognitive costs (e.g., cognitive load), and consequential costs (e.g., impact of errors). The measure of cost (e.g., in terms of energy, time, or money) is thus data, user, and taskdependent.
The term Potential Distortion (PD) measures the informative divergence between viewing the data through visualization with information loss and viewing the data without any information loss. The latter might be ideal but is usually at an unattainable cost except for values in a very small data space (i.e., in a small alphabet as discussed in [CG16]). PD is datadependent or userdependent. Given the same data visualization with the same amount of information loss, one can postulate that a user with more knowledge about the data or visual representation usually suffers less distortion. This postulation is the main focus of this paper.
Consider the visual representation of a network of arteries in Figure 1. The image was generated from a volume dataset using the maximum intensity projection (MIP) method. While it is known that MIP cannot convey depth information well, it has been widely used for observing some classes of medical imaging data, such as arteries. The highlighted area in Figure 1 shows an apparently flat area, which is a distortion from the actuality of a tubular surface likely with some small wrinkles and bumps. The doctors who deal with such medical data are expected to have sufficient knowledge to reconstruct the reality adequately from the “distorted” visualization, while being able to focus on more important task of making diagnostic decisions, e.g., about aneurysm.
As shown in some recent works, it is possible for visualization designers to estimate AC, PD, and Cost qualitatively [CGJM19, CE19] and quantitatively [TKC17, KARC17]. It is highly desirable to advance the scientific methods for quantitative estimation, towards the eventual realization of computerassisted analysis and optimization in designing visual representations. This work focuses on one challenge of quantitative estimation, i.e., how to estimate human knowledge that may be used in a visualization process.
Building on the methods of observational estimation in [TKC17] and controlled experiment in [KARC17], one may reasonably anticipate a systematic method based on a short interview by asking potential viewers a few questions. For example, one may use the question in Figure 1 to estimate the knowledge of doctors, patients, and any other people who may view such a visualization. The question is intended to tease out two pieces of knowledge that may help reduce the potential distortion due to the “flat area” depiction. One piece is about the general knowledge that associates arteries with tubelike shapes. Another, which is more advanced, is about the surface texture of arteries and the limitations of the MIP method.
Let the binary options about whether the “flat area” is actually flat or curved be an alphabet . The likelihood of the two options is represented by a probability distribution or probability mass function (PMF) , where . Since most arteries in the real world are of tubular shapes, one can imagine that a ground truth alphabet might have a PMF strongly in favor of the curved option. However, the visualization seems to suggest the opposite, implying a PMF strongly in favor of the flat option. It is not difficult to interview some potential viewers, enquiring how they would answer the question. One may estimate a PMF from doctors’ answers, and another from patients’ answers.
Scenario 1  Scenario 2  

:  
:  
:  
: 
Scenario 3  Scenario 4  

:  
:  
:  
: 
Table 1 shows two scenarios where different probability data is obtained. The values of PD are computed using the most wellknown divergence measure, KLdivergence [KL51], and are of unit bit. In Scenario 1, without any knowledge, the visualization process would suffer 6.50 bits of PD. As doctors are not fooled by the “flat area” shown in the MIP visualization, their knowledge is worth 6.50 bits. Meanwhile, patients would suffer 1.12 bits of PD on average, their knowledge is worth bits.
In Scenario 2, the PMFs of and depart further away, while and remain the same. Although doctors and patients would suffer more PD, their knowledge is worth more than that in Scenario 1 (i.e., bits and bits respectively).
Similarly, the binary options about whether the “flat area” is actually smooth or not can be defined by an alphabet . Table 2 shows two scenarios about collected probability data. In these two scenarios, doctors exhibit much more knowledge than patients, indicating that the surface texture of arteries is of specialized knowledge.
The above example demonstrates that using the KLdivergence to estimate PD can differentiate the knowledge variation between doctors and patients regarding the two pieces of knowledge that may reduce the distortion due to the “flat area”. When it is used in Eq. 1 in a relative or qualitative context (e.g., [CGJM19, CE19]), the unboundedness of the KLdivergence does not pose an issue.
However, this does become an issue when the KLdivergence is used to measure PD in an absolute and quantitative context. From the two diverging PMFs and in Table 1, or and in Table 2, we can observe that the smaller is, the more divergent the two PMFs become and the higher value the PD has. Indeed, consider an arbitrary alphabet , and two PMFs defined upon : and . When , we have the KLdivergence .
Meanwhile, the Shannon entropy of , , has an upper bound of 1 bit. It is thus not intuitive or practical to relate the value of to that of . Many applications of information theory do not relate these two types of values explicitly. When reasoning such relations is required, the common approach is to impose a lowerbound threshold for (e.g., [KARC17]). However, there is yet a consistent method for defining such a threshold for various alphabets in different applications, while preventing a range of small or large values (i.e., or ) in a PMF is often inconvenient in practice. In the following section, we discuss several approaches to defining a bounded measure for PD.
Note: for an informationtheoretic measure, we use an alphabet and its PMF interchangeably, e.g., .
4 Bounded Measures for Potential Distortion (PD)
Let be a process in a data intelligence workflow, be its input alphabet, and be its output alphabet. can be a humancentric process (e.g., visualization and interaction) or a machinecentric process (e.g., statistics and algorithms). In the original proposal [CG16], the value of Benefit in Eq. 1 is measured using:
(2) 
where is the Shannon entropy of an alphabet and is KLdivergence of an alphabet from a reference alphabet. Because the Shannon entropy of an alphabet with a finite number of letters is bounded, AC, which is the entropic difference between the input and output alphabets, is also bounded. On the other hand, as discussed in the previous section, PD is unbounded. Although Eq. 2 can be used for relative comparison, it is not quite intuitive in an absolute context, and it is difficult to imagine that the amount of informative distortion can be more than the maximum amount of information available.
In this section, we present the unpublished work by Chen and Sbert [CS19], which shows mathematically that for alphabets of a finite size, the KLdivergence used in Eq. 2 should ideally be bounded. In their arXiv report, they also outlined a new divergence metric and compare it with a few other bounded divergence measures. Building on initial comparison in [CS19], we use visualization in Section 4.2 and real world data in Section 5 to assist the multicriteria analysis and selection of a bounded divergence measure to replace the KLdivergence used in Eq. 2.
4.1 A Mathematical Proof of Boundedness
Let be an alphabet with a finite number of letters, , and is associated with a PMF, , such that:
(3) 
When we encode this alphabet using an entropy binary coding scheme [Mos12], we can be assured to achieve an optimal code with the lowest average length for codewords. One example of such a code for the above probability is:
(4) 
In this way, , which has the smallest probability, will always be assigned a codeword with the maximal length of . Entropy coding is designed to minimize the average number of bits per letter when one transmits a “very long” sequence of letters in the alphabet over a communication channel. Here the phrase “very long” implies that the string exhibits the above PMF (Eq. 3).
Suppose that is actually of PMF , but is encoded as Eq. 4 based on . The transmission of using this code will have inefficiency. The inefficiency is usually measured using cross entropy , such that:
(5) 
Clearly, the worst case is that the letter, , which was encoded using bits, turns out to be the most frequently used letter in (instead of the least in ). It is so frequent that all letters in the long string are of . So the average codeword length per letter of this string is . The situation cannot be worse. Therefore, is the upper bound of the cross entropy. From Eq. 5, we can also observe that must also be bounded since and are both bounded as long as has a finite number of letters. Let be the upper bound of . The upper bound for , , is thus:
(6) 
There is a special case worth noting. In practice, it is common to assume that is a uniform distribution, i.e., , typically because is unknown or varies frequently. Hence the assumption leads to a code with an average length equaling (or in practice, the smallest integer ). Under this special (but rather common) condition, all letters in a very long string have codewords of the same length. The worst case is that all letters in the string turn out to the same letter. Since there is no informative variation in the PMF for this very long string, i.e., , in principle, the transmission of this string is unnecessary. The maximal amount of inefficiency is thus . This is indeed much lower than the upper bound , justifying the assumption or use of a uniform in many situations.
(a)  (b)  (c)  (d) 
(e) ,  (f) ,  (g) ,  (h) , 
4.2 Bounded Measures and Their Visual Analysis
While numerical approximation may provide a bounded KLdivergence, it is not easy to determine the value of and it is difficult to ensure everyone to use the same for the same alphabet or comparable alphabets. It is therefore desirable to consider bounded measures that may be used in place of .
JensenShannon divergence is such a measure:
(7) 
where and are two PMFs associated with the same alphabet and is the average distribution of and . With the base 2 logarithm as in Eq. 7, is bounded by 0 and 1.
Another bounded measure is the conditional entropy :
(8) 
where is the joint probability of the two conditions of that are associated with and . is bounded by 0 and .
The third bounded measure was proposed as part of this work, which is referred as and is defined as follows:
(9) 
where . is bounded by 0 and 1.
In this work, we focus on two options of , i.e., when and . Since the KLdivergence is noncommutative, we can also have a noncommutative version of , i.e.,
(10) 
As , , and are bounded by [0, 1], if any of them is selected to replace , Eq. 2 can be rewritten as
(11) 
where denotes maximum entropy, while is a placeholder for , , or .
The four measures in Eqs. 7, 8, 9, 10 all consist of logarithmic scaling of probability values, in the same form of Shannon entropy. They are entropic measures. In addition, we also considered a set of nonentropic measures in the form of Minkowski distances, which have the following general form:
(12) 
To evaluate the suitability of the above measures, we can first consider three criteria. It is essential for the selected divergence measure to be bounded. Otherwise we can just use the KLdivergence. Another important criterion is the number of PMFs that the measure depends on. While all measures considered depend on two PMFs, the conditional entropy depends on three. Because it requires some effort to obtain a PMF, especially a joint probability distribution, this makes less favourable. In addition, we also prefer to have an entropic measure as it is more compatible with the measure of alphabet compression. With these three criteria, we can start our multicriteria analysis as summarized in Table 3, where we score each divergence measure against a criterion using an integer between 0 and 5, with 5 being the best. We will draw our conclusion about the multicriteria in Section 6.
Criteria  Importance  

1. Boundedness  critical  0  5  5  5  5  5  5  3  3 
2. Number of PMFs  important  5  5  2  5  5  5  5  5  5 
3. Entropic measures  important  5  5  5  5  5  5  5  1  1 
4. Curve shapes (Figure 2)  helpful  5  5  1  2  4  2  4  3  3 
5. Curve shapes (Figure 3)  helpful  5  4  3  5  3  5  2  3  
6. Scenario: good and bad (Figure 4)  helpful  3  5  4  5  4  
7. Scenario: A, B, C, D (Figure 5)  helpful  4  5  3  2  1  
8. Case Study 1 (Section 5.1)  important  5  1  5  
9. Case Study 2: (Section 5.2)  important  3  5 
We now consider several criteria using visualization. One desirable property is for a bounded measure to have a geometric behaviour similar to the KLdivergence. Since the KLdivergence is unbounded, we make use of a scaled version, , which does not rise up too quickly, though it is still unbounded.
Let us consider a simple alphabet , which is associated with two PMFs, and . We set , such that when , is most divergent away from . We can visualize how different measures numerically convey the divergence between and by observing their relationship with . Figure 2 compares several measures by varying the values of in the range of .
From Figure 2, we can observe that has almost a perfect match when , while is also fairly close. They thus score 5 and 4 respectively in Table 3. Meanwhile, the lines of curve in the opposite direction of . We score it 1. and are of similar shapes, with correlating with slightly better. We thus score 2 and 3. Note that for the above PMFs and , has the same curves as . Hence has the same score as in Table 3. With scored poorly, we focus on the other candidate measures in the rest of the analysis.
We now consider Figure 3, where the candidate measures are visualized in comparison with and in a range close to zero, i.e., . The ranges and are there only for references to the nearby contexts as they do not have the same logarithmic scale as that in the range . We can observe that in the curve of rises as almost quickly as . This confirms that simply scaling the KLdivergence is not an adequate solution. The curves of and converge to their maximum value 1.0 earlier than that of . If the curve of is used as a benchmark as in Figure 2, the curve of is closer to than that of . We thus score 5, , 4, 3, 3, and 2. Since we use the same PMFs and as in Figure 2, has the same curves and thus the same score as .
Let us consider a few numerical examples that may represent some practical scenarios. We use these scenarios to see if the values returned by different divergence measures make sense. Let be an alphabet with two letters, good and bad, for describing a scenario (e.g., an object or an event), which has the probability of good is , and that of bad is . In other words, . Imagine that a biased process (e.g., a distorted visualization, an incorrect algorithm, or a misleading communication) conveys the information about the scenario always bad, i.e., a PMF . Users at the receiving end of the process may have different knowledge about the actual scenario, and they will make a decision after receiving the output of the process. For example, we have five users and we have obtained the probability of their decisions as follows:

LD — The user has a little doubt about the output of the process, and decides bad 90% of the time, and good 10% of the time, i.e., with PMF .

FD — The user has a fair amount of doubt, with .

RG — The user makes a random guess, with .

UC — The user has adequate knowledge about , but undercompensate it slightly, with .

OC — The user has adequate knowledge about , but overcompensate it slightly, with .
We can use different candidate measures to compute the divergence between and . Figure 4 shows different divergence values returned by these measures. Each value is decomposed into two parts, one for good and one for bad. All these measures can order these five users reasonably well. The users UC (undercompensate) and OC (overcompensate) have the same values with and , while considers OC has slightly more divergence than UC (0.014 vs. 0.010). returns relatively low values than other measures. For UC and OC, , , and return small values , which are a bit difficult to estimate.
and show strong asymmetric patterns between good and bad, reflecting the probability values in . In other words, the more decisions on good, the more goodrelated divergence. This asymmetric pattern is not in anyway incorrect, as the KLdivergence is also noncommutative and would also produce much stronger asymmetric patterns. Meanwhile an argument for supporting commutative measures would point out that the higher probability of good in should also influence the balance between the goodrelated divergence.
We decide to score 3 because of its lower valuation and its nonequal comparison of OU and OC. We score and 5; and and 4 as the values returned by and are slightly more intuitive.
We now consider a slightly more complicated scenario with four pieces of data, A, B, C, and D, which can be defined as an alphabet with four letters. The ground truth PMF is . Consider two processes that combine these into two classes AB and CD. These typify clustering algorithms, downsampling processes, discretization in visual mapping, and so on. One process is considered to be correct, which has a PMF for AB and CD as , and another biased process with . Let CG, CU, and CH be three users at the receiving end of the correct process, and BG, BS, and BM be three other users at the receiving end of the biased process. The users with different knowledge exhibit different abilities to reconstruct the original scenario featuring A, B, C, D from aggregated information about AB and CD. Similar to the goodbad scenario, such abilities can be captured by a PMF . For example, we have:

CG makes random guess, .

CU has useful knowledge, .

CB is highly biased, .

BG makes guess based on , .

BS makes a small adjustment, .

BM makes a major adjustment, .
Figure 5 compares the divergence values returned by the candidate measures for these six users. We can observe that and return values , which seem to be less intuitive. Meanwhile shows a large portion of divergence from the AB category, while shows more divergence in the BC category. In particular, for user BG, does not show any divergence in relation to A and B, though BG clearly has reasoned A and B rather incorrectly. shows a relatively balanced account of divergence associated with A, B, C, and D. On balance, we give scores 5, 4, 3, 2, 1 to , , , , and respectively.
With the major shortcomings of in this scenario, we can now focus on three commutative measures and in conjunction with two case studies.
5 Case Studies
To complement the visual analysis in Section 4.2, we conducted two surveys to collect some realistic examples that feature the use of knowledge in visualization. In addition to supporting the selection of a bounded measure for potential distortion, the surveys were also designed to demonstrate that one could use a few simple questions to estimate the costbenefit of visualization in relation to individual users. Built on the visual analysis in the previous section, we focus on three divergence measures, namely the JS divergence and two versions of the new divergence, i.e., with and . We denote as , and as .
5.1 Volume Visualization
This survey, which involved ten surveyees, was designed to collect some realworld data that reflects the use of knowledge in viewing volume visualization images. The full set of questions were presented to surveyees in the form of slides, which are included in the supplementary materials. The full set of survey results is given in Appendix C. The featured volume datasets were from “The Volume Library” [Roe19], and visualization images were either rendered by the authors or from one of the four publications [NSW02, CSC06, WQ07, Jun19].
The transformation from a volumetric dataset to a volumerendered image typically features a noticeable amount of alphabet compression. Some major algorithmic functions in volume visualization, e.g., isosurfacing, transfer function, and rendering integral, all facilitate alphabet compression, hence information loss.
In terms of rendering integral, maximum intensity projection (MIP) incurs a huge amount of information loss in comparison with the commonlyused emissionandabsorption integral [MC10]. As shown in Figure 1, the surface of arteries are depicted more or less in the same color. The accompanying question intends to tease out two pieces of knowledge, “curved surface” and “with wrinkles and bumps”. Among the ten surveyees, one selected the correct answer B, while seven selected the relatively plausible answer A and one selected the less plausible answer D.
Let alphabet contain the four optional answers. One may assume a ground truth PMF since there might still be a small probability for a section of artery to be flat or smooth. The rendered image depicts a misleading impression, implying that answer C is correct or a false PMF . The amount of alphabet compression is thus .
When a surveyee gives an answer to the question, it can also be considered as a PMF . Different answers thus lead to different values of divergence as follows:
Without any knowledge, a surveyee would select answer C, leading to the highest value of divergence in terms of any of the three measures. Based PMF , we expect to have divergence values in the order of C > A > D B. Both and have produced values in that order, while indicates an order A > D > C B, which cannot be interpreted easily. We thus score 1 in Table 3, and leave it out in the following discussions.
Together with the alphabet compression and the maximum entropy of 2 bits, we can also calculate the informative benefit using Eq. 11. For surveyees with different answers, the lossy depiction of the surface of arteries brought about different amounts of benefit:
The two sets of values both indicate that only those surveyees who gave answer C would benefit from such lossy depiction produced by MIP. One may also consider the scenarios where flat or smooth surfaces are more probable. For example, if the ground truth PMF were and , the amounts of benefit would be:
Because the ground truth PMF would be less certain, the knowledge of “curved surface” and “with wrinkles and bumps” would become more useful. Further, because the probability of flat and smooth surfaces would have also increased, an answer C would not be as bad as when it is with the original PMF .
The above example of MIP rendering shows that to those users with the appropriate knowledge, the missing information in a visualization image is not really “lost”. Using the categorization of visual multiplexing [CWB14], the information about “curved surface” and “with wrinkles and bumps” is conveyed using a hollow visual channel. Volume visualization features some other forms of visual multiplexing. The viewers’ ability to demultiplex depends on their knowledge, which can now be estimated quantitatively.
Figure 6 shows another volumerendered image used in the survey. Two isosurfaces of a head dataset are depicted with translucent occlusion, which is a type of visual multiplexing [CWB14]. Meanwhile, the voxels for soft tissue and muscle are not depicted at all, which can also been regarded as using a hollow visual channel. The visual representation has been widely used, and the viewers are expected to use their knowledge to infer the 3D relationships between the two isosurfaces as well as the missing information about soft tissue and muscle. The question that accompanies the figure is for estimating such knowledge.
Although the survey offers only four options, it could in fact offer many other configurations as optional answers. Let us consider four colorcoded segments similar to the configurations in answers C and D. Each segment could be one of four types: bone, skin, soft tissue and muscle, or background. There are a total of configurations. If one had to consider the variation of segment thickness, there would be many more options. Because it would not be appropriate to ask a surveyee to select an answer from 256 options, a typical assumption is that the selected four options are representative. In other words, considering that the 256 options are letters of an alphabet, any unselected letter has a probability similar to one of the four selected options.
For example, we can estimate a ground truth PMF such that among the 256 letters,

Answer A and four other letters have a probability 0.01,

Answer B and 64 other letters have a probability 0.0002,

Answer C and 184 other letters have a probability 0.0001,

Answer D has a probability 0.9185.
We have the entropy of this alphabet . Similar to the previous example, we can estimate the values of divergence as:
where denotes zeros. With the maximum entropy being 8 bits, we can estimate the amounts of informative benefit as:
Because both and have returned some sensible values, we give a score of 5 to each of them in Table 3.
5.2 London Underground Map
This survey was designed to collect some realworld data that reflects the use of some knowledge in viewing different London underground maps. It involved sixteen surveyees, twelve at King’s College London (KCL) and four at University of Oxford. Surveyees were interviewed individually in a setup as shown in Figure 7. Each surveyee was asked to answer 12 questions using either map, followed by two further questions about their familiarity of a metro system and London. A £5 Amazon voucher was offered to each surveyee as an appreciation of their effort and time. The survey sheets and the full set of survey results are given in Appendix D.
Harry Beck first introduced geographicallydeformed design of the London underground maps in 1931. Today almost all metro maps around the world adopt this design concept. Informationtheoretically, the transformation of a geographicallyfaithful map to such a geographicallydeformed map causes a significant loss of information. Naturally, this affects some tasks more than others.
For example, the distances between stations on a deformed map are not as useful as in a faithful map. The first four questions in the survey asked surveyees to estimate how long it would take to walk (i) from Charing Cross to Oxford Circus, (ii) from Temple and Leicester Square, (iii) from Stanmore to Edgware, and (iv) from South Rulslip to South Harrow. On the deformed map, the distances between the four pairs of the stations are all about 50mm. On the faithful map, the distances are (i) 21mm, (ii) 14mm, (iii) 31mm, and (iv) 53mm respectively. According to the Google map, the estimated walk distance and time are (i) 0.9 miles, 20 minutes; (ii) 0.8 miles, 17 minutes; (iii) 1.6 miles, 32 minutes; and (iv) 2.2 miles, 45 minutes respectively.
The average range of the estimations about the walk time by the 12 surveyees at KCL are: (i) 19.25 [8, 30], (ii) 19.67 [5, 30], (iii) 46.25 [10, 240], and (iv) 59.17 [20, 120] minutes. The estimations by the four surveyees at Oxford are: (i) 16.25 [15, 20], (ii) 10 [5, 15], (iii) 37.25 [25, 60], and (iv) 33.75 [20, 60] minutes. The values correlate better to the Google estimations than what would be implied by the similar distances on the deformed map. Clearly some surveyees were using some knowledge to make better inference.
Let be an alphabet of integers between 1 and 256. The range is chosen partly to cover the range of the answers in the survey, and partly to round up the maximum entropy to 8 bits. For each pair of stations, we can define a PMF using a skew normal distribution peaked at the Google estimation . As an illustration, we coarsely approximate the PMF as , where
Using the same way in the previous case study, we can estimate the divergence for an answer in range, resulting in:
With the entropy of the alphabet as bits and the maximum entropy being 8 bits, we can estimate the amounts of informative benefit for different answers as:
For instance, surveyee P9, who has lived in a city with a metro system for a period of 15 years and lived in London for several months, made similarly good estimations about the walking time with both types of underground maps. With one spot on answer and one close answer under each condition, the estimated benefit on average is bits if one uses or bits if one uses . Meanwhile, surveyee P3, who has lived in a city with a metro system for two months, provided all four answers in the wild guess category, leading to negative benefit with both and .
Among the first set of four questions, Questions 1 and 2 are about stations near KCL, and Questions 3 and 4 are about stations more than 10 miles away from KCL. The local knowledge of the surveyees from KCL clearly helped their answers. Among the answers given by the twelve surveyees from KCL,

For Question 1, four spot on, five close, and three wild guess — the average benefit is with or with .

For Question 2, two spot on, nine close, and one wild guess — the average benefit is with or with .

For Question 3, three close, and nine wild guess — the average benefit is with or with .

For Question 4, two spot on, one close, and nine wild guess — the average benefit is with or with .
From the above calculation, we also notice that tends to produce higher divergence values, and seems a bit “too eager” to give negative benefit values. With the above real world data, produces measures that can be interpreted more intuitively. We therefore give (i.e., ) a 5 score and a 3 score.
When we consider answering each of Questions 14 as performing a visualization task, we can estimate the costbenefit ratio of each process. As the survey also collected the time used by each surveyee in answering each question, the cost in Eq. 1 can be approximated with the mean response time. For Questions 14, the mean response times by the surveyees at KCL are 9.27, 9.48, 14.65, and 11.40 seconds respectively. Using the benefit values based on , the costbenefit ratios are thus 0.0113, 0.0075, 0.0003, and 0.0033 bits/second respectively. While these values indicate the benefits of the local knowledge used in answering Questions 1 and 2, they also indicate that when the local knowledge is absent in the case of Questions 3 and 4, the deformed map (i.e., Question 3) is less costbeneficial.
6 Conclusions
In this paper, we have considered the need to improve the mathematical formulation of an informationtheoretic measure for analyzing the costbenefit of visualization as well as other processes in a data intelligence workflow [CG16]. The concern about the original measure is its unbounded term based on the KLdivergence. We have obtained a proof that as long as the input and output alphabets of a process have a finite number of letters, the divergence measure used in the costbenefit formula should be bounded.
We have considered a number of bounded measures to replace the unbounded term, including a new divergence measure and its variation . We have conducted multicriteria analysis to select the best measure among these candidates. In particular, we have used visualization to aid the observation of different properties of the candidate measures, assisting in the analysis of four criteria. We have conducted two case studies, both in the form of surveys. One consists of questions about volume visualizations, while the other features visualization tasks performed in conjunction with two types of London Underground maps. The case studies allowed us to test some most promising candidate measures with the real world data collected in the two surveys, providing important evidence to two important aspects of the multicriteria analysis.
From Table 3, we can observe the process of narrowing down from eight candidate measures to two measures. Taking the importance of the criteria into account, we consider that candidate is slightly ahead of . We therefore propose to revise the original costbenefit ratio in [CG16] to the following:
(13) 
This costbenefit measure was developed in the field of visualization, for optimizing visualization processes and visual analytics workflows. It is now being improved by using visual analysis and with the survey data collected in the context of visualization applications. We would like to continue our theoretical investigation into the mathematical properties of the new divergence measure. Meanwhile, having a bounded costbenefit measure offers many new opportunities of using it in practical applications, especially in visualization and visual analytics.
ï»¿
Appendices
A Bounded Measure for Estimating the Benefit of Visualization
Min Chen, University of Oxford, UK
Mateu Sbert, University of Girona, Spain
Alfie AbdulRahman, King’s College London, UK
Deborah Silver, Rutgers University, USA
Appendix A Further Details of the Original CostBenefit Ratio
This appendix contains an extraction from a previous publication [CE19], which provides a relatively concise but informative description of the costbenefit ratio proposed in [CG16]. The inclusion is to minimize the readers’ effort to locate such an explanation. The extraction has been slightly modified.
Chen and Golan introduced an informationtheoretic metric for measuring the costbenefit ratio of a visual analytics (VA) workflow or any of its component processes [CG16]. The metric consists of three fundamental measures that are abstract representations of a variety of qualitative and quantitative criteria used in practice, including operational requirements (e.g., accuracy, speed, errors, uncertainty, provenance, automation), analytical capability (e.g., filtering, clustering, classification, summarization), cognitive capabilities (e.g., memorization, learning, contextawareness, confidence), and so on. The abstraction results in a metric with the desirable mathematical simplicity [CG16]. The qualitative form of the metric is as follows:
(14) 
The metric describes the tradeoff among the three measures:

Alphabet Compression (AC) measures the amount of entropy reduction (or information loss) achieved by a process. As it was noticed in [CG16], most visual analytics processes (e.g., statistical aggregation, sorting, clustering, visual mapping, and interaction), feature manytoone mappings from input to output, hence losing information. Although information loss is commonly regarded harmful, it cannot be all bad if it is a general trend of VA workflows. Thus the costbenefit metric makes AC a positive component.

Potential Distortion (PD) balances the positive nature of AC by measuring the errors typically due to information loss. Instead of measuring mapping errors using some third party metrics, PD measures the potential distortion when one reconstructs inputs from outputs. The measurement takes into account humans’ knowledge that can be used to improve the reconstruction processes. For example, given an average mark of 62%, the teacher who taught the class can normally guess the distribution of the marks among the students better than an arbitrary person.

Cost (Ct) of the forward transformation from input to output and the inverse transformation of reconstruction provides a further balancing factor in the costbenefit metric in addition to the tradeoff between AC and PD. In practice, one may measure the cost using time or a monetary measurement.
Surveyee’s ID  
Questions (with correct answers in brackets)  S1  S2  S3  S4  S5  S6  S7  S8  P9  P10 
1. Use of different transfer functions (D), dataset: Carp  (D)  (D)  (D)  (D)  (D)  c  b  (D)  a  c 
2. Use of translucency in volume rendering (C), dataset: Engine Block  (C)  (C)  (C)  (C)  (C)  (C)  (C)  (C)  d  (C) 
3. Omission of voxels of soft tissue and muscle (D), dataset: CT head  (D)  (D)  (D)  (D)  b  b  a  (D)  a  (D) 
4. sharp objects in volumerendered CT data (C), dataset: CT head  (C)  (C)  a  (C)  a  b  d  b  b  b 
5. Loss of 3D information with MIP (B, a), dataset: Aneurism  (a)  (B)  (a)  (a)  (a)  (a)  D  (a)  (a)  (a) 
6. Use of volume deformation (A), dataset: CT head  (A)  (A)  b  (A)  (A)  b  b  (A)  b  b 
7. Toe nails in nonphotorealistic volume rendering (B, c): dataset: Foot  (c)  (c)  (c)  (B)  (c)  (B)  (B)  (B)  (B)  (c) 
8. Noise in nonphotorealistic volume rendering (B): dataset: Foot  (B)  (B)  (B)  (B)  (B)  (B)  a  (B)  c  (B) 
9. Knowledge about 3D medical imaging technology [1 lowest. 5 highest]  4  3  4  5  3  3  3  3  2  1 
10. Knowledge about volume rendering techniques [1 lowest. 5 highest]  5  5  45  4  4  3  3  3  2  1 
Appendix B Basic Formulas of InformationTheoretic Measures
This section is included for selfcontainment. Some readers who have the essential knowledge of probability theory but are unfamiliar with information theory may find these formulas useful.
Let be an alphabet and be one of its letters. is associated with a probability distribution or probability mass function (PMF) such that and . The Shannon Entropy of is:
Here we use base 2 logarithm as the unit of bit is more intuitive in context of computer science and data science.
An alphabet may have different PMFs in different conditions. Let and be such PMFs. The KLDivergence describes the difference between the two PMFs in bits:
is referred as the divergence of from . This is not a metric since cannot be assured.
Related to the above two measures, Cross Entropy is defined as:
Sometime, one may consider as two alphabets and with the same ordered set of letters but two different PMFs. In such case, one may denote the KLDivergence as , and the cross entropy as .
Appendix C Survey Results of Useful Knowledge in Volume Visualization
This survey consists of eight questions presented as slides. The questionnaire is given as part of the supplementary materials. The ten surveyees are primarily colleagues from the UK, Spain, and the USA. They include doctors and experts of medical imaging and visualization, as well as several persons who are not familiar with the technologies of medical imaging and data visualization. Table 4 summarizes the answers from these ten surveyees.
Appendix D Survey Results of Useful Knowledge in Viewing London Underground Maps
Figures 8, 9, and 10 show the questionnaire used in the survey about two types of London Underground maps. Table 5 summarizes the data from the answers by the 12 surveyees at King’s College London, while Table 6 summarizes the data from the answers by the four surveyees at University Oxford.
In Section 5.2, we have discussed Questions 14 in some detail. In the survey, Questions 58 constitute the second set. Each question asks surveyees to first identify two stations along a given underground line, and then determine how many stops between the two stations. All surveyees identified the stations correctly for all four questions, and most have also counted the stops correctly. In general, for each of these cases, one can establish an alphabet of all possible answers in a way similar to the example of walking distances. However, we have not observed any interesting correlation between the correctness and the surveyees’ knowledge about metro systems or London.
With the third set of four questions, each questions asks surveyees to identify the closest station for changing between two given stations on different lines. All surveyees identified the changing stations correctly for all questions.
The design of Questions 512 was also intended to collect data that might differentiate the deformed map from the faithful map in terms of the time required for answering questions. As shown in Figure 11, the questions were paired, such that the two questions feature the same level of difficulties. Although the comparison seems to suggest that the faithful map might have some advantage in the setting of this survey, we cannot be certain about this observation as the sample size is not large enough. In general, we cannot draw any meaningful conclusion about the cost in terms of time. We hope to collect more real world data about the timing cost of visualization processes for making further advances in applying information theory to visualization.
Meanwhile, we consider that the space cost is valid consideration. While both maps have a similar size (i.e., deformed map: 850mm580mm, faithful map: 840mm595mm, their font sizes for station labels are very different. For long station names, “High Street Kensington” and “Totteridge & Whetstone”, the labels on the deformed map are of 35mm and 37mm in length, while those on the faithful map are of 17mm and 18mm long. Taking the height into account, the space used for station labels in the deformed map is about four times of that in the faithful map. In other worlds, if the faithful map were to display its labels with the same font size, the cost of the space would be four times of that of the deformed map.
Surveyee’s ID  
Questions  P1  P2  P3  P4  P5  P6  P7  P8  P9  P10  P11  P12  mean  
Q1:  answer (min.)  8  30  12  16  20  15  10  30  20  20  20  30  19.25 
time (sec.)  06.22  07.66  09.78  11.66  03.72  04.85  08.85  21.12  12.72  11.22  03.38  10.06  09.27  
Q2:  answer (min.)  15  30  5  22  15  14  20  20  25  25  25  20  19.67 
time (sec.)  10.25  09.78  06.44  09.29  12.12  06.09  17.28  06.75  12.31  06.85  06.03  10.56  09.48  
Q3:  answer (min.)  20  45  10  70  20  20  20  35  25  30  20  240  46.25 
time (sec.)  19.43  13.37  10.06  09.25  14.06  10.84  12.46  19.03  11.50  16.09  11.28  28.41  14.65  
Q4:  answer (min.)  60  60  35  100  30  20  45  35  45  120  40  120  59.17 
time (sec.)  11.31  10.62  10.56  12.47  08.21  07.15  18.72  08.91  08.06  12.62  03.88  24.19  11.39  
Q5:  time 1 (sec.)  22.15  01.75  07.25  03.78  14.25  37.68  06.63  13.75  19.41  06.47  03.41  34.97  14.29 
time 2 (sec.)  24.22  08.28  17.94  05.60  17.94  57.99  21.76  20.50  27.16  13.24  22.66  40.88  23.18  
answer (10)  10  10  10  9  10  10  10  10  9  10  10  10  
time (sec.)  06.13  28.81  08.35  06.22  09.06  06.35  09.93  12.69  10.47  05.54  08.66  27.75  11.66  
Q6:  time 1 (sec.)  02.43  08.28  01.97  08.87  05.06  02.84  06.97  10.15  18.10  21.53  03.00  07.40  08.05 
time 2 (sec.)  12.99  27.69  04.81  10.31  15.97  04.65  17.56  16.31  20.25  24.69  15.34  20.68  15.94  
answer (9)  9  10  9  9  4  9  9  9  8  9  9  9  
time (sec.)  07.50  06.53  04.44  16.53  19.41  05.06  13.47  07.03  12.44  04.78  07.91  16.34  10.12  
Q7:  time 1 (sec.)  17.37  08.56  01.34  03.16  08.12  01.25  21.75  15.56  02.81  07.84  02.22  46.72  11.39 
time 2 (sec.)  17.38  13.15  02.34  03.70  08.81  02.25  22.75  26.00  17.97  10.37  03.18  47.75  14.64  
answer (7)  7  7  7  7  6  7  7  7  6  7  7  7  
time (sec.)  07.53  06.34  03.47  03.87  02.75  04.09  02.16  04.94  26.88  05.31  06.63  12.84  07.23  
Q8:  time 1 (sec.)  12.00  08.50  06.09  02.88  08.62  14.78  19.12  08.53  12.50  10.22  12.50  20.00  11.31 
time 2 (sec.)  13.44  10.78  23.37  09.29  13.03  36.34  23.55  09.50  13.53  10.23  32.44  22.60  18.18  
answer (6)  6  6  6  6  6  6  6  6  6  6  6  6  
time (sec.)  02.62  05.94  02.15  04.09  04.94  07.06  07.50  04.90  04.37  04.53  05.47  09.43  05.25  
Q9:  answer (P)  P  P  P  P  P  P  P  P  P  P  P  P  
time (sec.)  35.78  02.87  07.40  13.03  06.97  52.15  13.56  02.16  08.13  09.06  01.93  08.44  13.46  
Q10:  answer (LB)  LB  LB  LB  LB  LB  LB  LB  LB  LB  LB  LB  LB  
time (sec.)  05.50  03.13  12.04  14.97  07.00  26.38  11.31  03.38  06.75  07.47  06.50  09.82  09.52  
Q11:  answer (WP)  WP  WP  WP  WP  WP  WP  WP  WP  WP  WP  WP  WP  
time (sec.)  06.07  05.35  07.72  05.00  04.32  23.72  05.25  03.07  10.66  05.37  02.94  17.37  08.07  
Q12:  answer (FP)  FP  FP  FP  FP  FP  FP  FP  FP  FP  FP  FP  FP  
time (sec.)  05.16  02.56  11.78  08.62  03.60  19.72  11.28  03.94  20.72  01.56  02.50  06.84  08.19  
live in metro city  5yr  5yr  mths  15yr  5yr  15yr  weeks  5yr  15yr  5yr  mths  mths  
live in London  5yr  5yr  mths  15yr  15yr  mths  mths  mths  mths  mths  mths  mths 
Surveyee’s ID  
Questions  P13  P14  P15  P16  mean  
Q1:  answer (min.)  15  20  15  15  16.25 
time (sec.)  11.81  18.52  08.18  07.63  11.52  
Q2:  answer (min.)  5  5  15  15  10.00 
time (sec.)  11.10  02.46  13.77  10.94  09.57  
Q3:  answer (min.)  35  60  30  25  37.50 
time (sec.)  21.91  16.11  10.08  22.53  17.66  
Q4:  answer (min.)  20  30  60  25  33.75 
time (sec.)  13.28  16.21  08.71  18.87  14.27  
Q5:  time 1 (sec.)  17.72  07.35  17.22  09.25  12.89 
time 2 (sec.)  21.06  17.00  19.04  12.37  17.37  
answer (10)  10  8  10  10  
time (sec.)  04.82  02.45  02.96  15.57  06.45  
Q6:  time 1 (sec.)  35.04  38.12  11.29  07.55  23.00 
time 2 (sec.)  45.60  41.32  20.23  40.12  36.82  
answer (9)  9  10  9  8  
time (sec.)  03.82  13.57  08.15  34.32  14.97  
Q7:  time 1 (sec.)  01.05  02.39  09.55  11.19  06.05 
time 2 (sec.)  02.15  05.45  09.58  13.47  07.66  
answer (7)  10  6  7  7  
time (sec.)  01.06  01.60  02.51  14.06  04.81  
Q8:  time 1 (sec.)  08.74  26.14  20.37  15.01  17.57 
time 2 (sec.)  16.50  30.55  27.01  17.91  22.99  
answer (6)  6  6  6  6  
time (sec.)  09.30  03.00  02.11  04.94  04.48  
Q9:  answer (P)  P  P  P  P  
time (sec.)  05.96  09.38  04.56  05.16  06.27  
Q10:  answer (LB)  LB  LB  LB  LB  
time (sec.)  12.74  07.77  01.30  09.94  07.94  
Q11:  answer (WP)  WP  WP  WP  WP  
time (sec.)  09.84  04.43  03.39  07.18  06.21  
Q12:  answer (FP)  FP  FP  FP  FP  
time (sec.)  06.22  10.46  06.78  05.10  07.14  
live in metro city  never  days  days  days  
live in London  never  days  days  days 
References
 Bramon R., Boada I., Bardera A., Rodríguez Q., Feixas M., Puig J., Sbert M.: Multimodal data fusion based on mutual information. IEEE Transactions on Visualization and Computer Graphics 18, 9 (2012), 1574–1587.
 Biswas A., Dutta S., Shen H.W., Woodring J.: An informationaware framework for exploring multivariate data sets. IEEE Transactions on Visualization and Computer Graphics 19, 12 (2013), 2683–2692.
 Bruckner S., Möller T.: Isosurface similarity maps. Computer Graphics Forum 29, 3 (2010), 773–782.
 Bramon R., Ruiz M., Bardera A., Boada I., Feixas M., Sbert M.: An informationtheoretic observation channel for volume visualization. Computer Graphics Forum 32, 3pt4 (2013), 411–420.
 Bramon R., Ruiz M., Bardera A., Boada I., Feixas M., Sbert M.: Information theorybased automatic multimodal transfer function design. IEEE Journal of Biomedical and Health Informatics 17, 4 (2013), 870–880.
 Bordoloi U., Shen H.W.: View selection for volume rendering. In Proc. IEEE Visualization (2005), pp. 487–494.
 Chen M., Ebert D. S.: An ontological framework for supporting the design and evaluation of visual analytics systems. Computer Graphics Forum 38, 3 (2019), 131–144.
 Chen M., Feixas M., Viola I., Bardera A., Shen H.W., Sbert M.: Information Theory Tools for Visualization. A K Peters, 2016.
 Chen M., Golan A.: What may visualization processes optimize? IEEE Transactions on Visualization and Computer Graphics 22, 12 (2016), 2619–2632.
 Chen M., Grinstein G., Johnson C. R., Kennedy J., Tory M.: Pathways for theoretical advances in visualization. IEEE Computer Graphics and Applications 37, 4 (2017), 103–112.
 Chen M., Gaither K., John N. W., McCann B.: Costbenefit analysis of visualization in virtual environments. IEEE Transactions on Visualization and Computer Graphics 25, 1 (2019), 32–42.
 Chen M., Jänicke H.: An informationtheoretic framework for visualization. IEEE Transactions on Visualization and Computer Graphics 16, 6 (2010), 1206–1215.
 Chen M., Sbert M.: On the upper bound of the kullbackleibler divergence and cross entropy. arXiv:1911.08334, 2019.
 Correa C., Silver D., Chen M.: Feature aligned volume manipulation for illustration and visualization. IEEE Transactions on Visualization and Computer Graphics 12, 5 (2006), 1069–1076.
 Cover T. M., Thomas J. A.: Elements of Information Theory. John Wiley & Sons, 2006.
 Chen M., Walton S., Berger K., Thiyagalingam J., Duffy B., Fang H., Holloway C., Trefethen A. E.: Visual multiplexing. Computer Graphics Forum 33, 3 (2014), 241–250.
 Feixas M., del Acebo E., Bekaert P., Sbert M.: An information theory framework for the analysis of scene complexity. Computer Graphics Forum 18, 3 (1999), 95–106.
 Feixas M., Sbert M., González F.: A unified informationtheoretic framework for viewpoint selection and mesh saliency. ACM Transactions on Applied Perception 6, 1 (2009), 1–23.
 Gumhold S.: Maximum entropy light source placement. In Proc. IEEE Visualization (2002), pp. 275–282.
 Jänicke H., Scheuermann G.: Visual analysis of flow features using information theory. IEEE Computer Graphics and Applications 30, 1 (2010), 40–49.
 Jung Y.: instantreality 1.0. https://doc.instantreality.org/tutorial/volumerendering/, last accessed in 2019.
 Jänicke H., Wiebel A., Scheuermann G., Kollmann W.: Multifield visualization using local statistical complexity. IEEE Transactions on Visualization and Computer Graphics 13, 6 (2007), 1384–1391.
 Kijmongkolchai N., AbdulRahman A., Chen M.: Empirically measuring soft knowledge in visualization. Computer Graphics Forum 36, 3 (2017), 73–85.
 Kullback S., Leibler R. A.: On information and sufficiency. Annals of Mathematical Statistics 22, 1 (1951), 79–86.
 Lin J.: Divergence measures based on the shannon entropy. IEEE Transactions on Information Theory 37 (1991), 145â151.
 Max N., Chen M.: Local and global illumination in the volume rendering integral. In Scientific Visualization: Advanced Concepts, Hagen H., (Ed.). Schloss Dagstuhl, Wadern, Germany, 2010.
 Moser S. M.: A Student’s Guide to Coding and Information Theory. Cambridge University Press, 2012.
 Ng C. U., Martin G.: Automatic selection of attributes by importance in relevance feedback visualisation. In Proc. Information Visualisation (2004), pp. 588–595.
 Nagy Z., Schneide J., Westerman R.: Interactive volume illustration. In Proc. Vision, Modeling and Visualization (2002).
 Purchase H. C., Andrienko N., JankunKelly T. J., Ward M.: Theoretical foundations of information visualization. In Information Visualization: HumanCentered Issues and Perspectives, Springer LNCS 4950. 2008, pp. 46–64.
 Ruiz M., Bardera A., Boada I., Viola I., Feixas M., Sbert M.: Automatic transfer functions based on informational divergence. IEEE Transactions on Visualization and Computer Graphics 17, 12 (2011), 1932–1941.
 Rigau J., Feixas M., Sbert M.: Shape complexity based on mutual information. In Proc. IEEE Shape Modeling and Applications (2005).
 Roettger S.: The volume library. http://schorsch.efi.fhnuernberg.de/data/volume/, last accessed in 2019.
 Shannon C. E.: A mathematical theory of communication. Bell System Technical Journal 27 (1948), 379–423.
 Tam G. K. L., Kothari V., Chen M.: An analysis of machine and humananalytics in classification. IEEE Transactions on Visualization and Computer Graphics 23, 1 (2017).
 Takahashi S., Takeshima Y.: A featuredriven approach to locating optimal viewpoints for volume visualization. In Proc. IEEE Visualization (2005), pp. 495–502.
 Viola I., Chen M., Isenberg T.: Visual abstraction. In Foundations of Data Visualization, Chen M., Hauser H., Rheingans P., Scheuermann G., (Eds.). Springer, 2020. Preprint at arXiv:1910.03310, 2019.
 Viola I., Feixas M., Sbert M., Gröller M. E.: Importancedriven focus of attention. IEEE Transactions on Visualization and Computer Graphics 12, 5 (2006), 933–940.
 Vázquez P.P., Feixas M., Sbert M., Heidrich W.: Automatic view selection using viewpoint entropy and its application to imagebased modelling. Computer Graphics Forum 22, 4 (2004), 689–700.
 Wei T.H., Lee T.Y., Shen H.W.: Evaluating isosurfaces with levelsetbased information maps. Computer Graphics Forum 32, 3 (2013), 1–10.
 Wu Y., Qu H.: Interactive transfer function design based on editing direct volume rendered images. IEEE Transactions on Visualization and Computer Graphics 13, 5 (2007), 1027–1040.
 Wang C., Shen H.W.: LOD Map  a visual interface for navigating multiresolution volume visualization. IEEE Transactions on Visualization and Computer Graphics 12, 5 (2005), 1029–1036.
 Wang C., Shen H.W.: Information theory in scientific visualization. Entropy 13 (2011), 254–273.
 Wang C., Yu H., Ma K.L.: Importancedriven timevarying data visualization. IEEE Transactions on Visualization and Computer Graphics 14, 6 (2008), 1547–1554.
 Xu L., Lee T. Y., Shen H. W.: An informationtheoretic framework for flow visualization. IEEE Transactions on Visualization and Computer Graphics 16, 6 (2010), 1216–1224.