Balancing the Tradeoff Between Clustering Value and Interpretability
Abstract.
Graph clustering groups entities – the vertices of a graph – based on their similarity, typically using a complex distance function over a large number of features. Successful integration of clustering approaches in automated decisionsupport systems hinges on the interpretability of the resulting clusters. This paper addresses the problem of generating interpretable clusters, given features of interest that signify interpretability to an enduser, by optimizing interpretability in addition to common clustering objectives. We propose a interpretable clustering algorithm that ensures that at least fraction of nodes in each cluster share the same feature value. The tunable parameter is userspecified. We also present a more efficient algorithm for scenarios with and analyze the theoretical guarantees of the two algorithms. Finally, we empirically demonstrate the benefits of our approaches in generating interpretable clusters using four realworld datasets. The interpretability of the clusters is complemented by generating simple explanations denoting the feature values of the nodes in the clusters, using frequent pattern mining.
1. Introduction
Graph clustering is increasingly used as an integral part of automated decision support systems for highstake applications such as infrastructure development (Hospers et al., 2009), criminal justice (Aljrees et al., 2016), and health care (Haraty et al., 2015). Such domains are characterized by highdimensional data and the goal of clustering is to group these nodes, typically based on similarity over all the features (Jain et al., 1999). The solution quality of the resulting clusters is measured by the objective value. As the number of features increases, it is increasingly difficult for an enduser to interpret the resulting clusters.
For example, consider the problem of clustering districts in Kenya to aid decisionmaking for infrastructure development (Figure 1), sanitation in particular (ICT, 2017; Authority, 2017). Each district is described by features denoting the population, access to basic sanitation, gender and age demographics, and location. The districts in a cluster are typically considered to be indistinguishable and hence may be assigned the same development policies. The similarity of districts for clustering is measured based on all the features. As a result, it is likely that the cluster composition is heterogeneous with respect to the sanitation feature (Figure 1(a)). This may significantly affect the decisionmaker’s ability to infer meaningful patterns, especially due to lack of ground truth, thereby affecting their policy decisions.
Recently, there has been growing interest in interpretable machine learning models (DoshiVelez and Kim, 2017; Lakkaraju et al., 2019; Rudin, 2019), mostly focusing on explainable predictive models or interpretable neural networks. There is limited prior research, if any, on improving the interpretability of clusters (Bertsimas et al., 2018; Chen et al., 2016). Clustering results are expected to be inherently interpretable as the aim of clustering is to group similar nodes together. However, when clustering with a large number of features, interpretability may be diminished since no clear patterns may be easy to recognize for an enduser, as in Figure 1(a).
Interpretability of the clusters is critical in highimpact domains since decision makers need to understand the solution beyond how the data is grouped into clusters: what characterizes a cluster and how it is different from other clusters. Additionally, the ability of a decision maker to evaluate the system for fairness and identify when to trust the system hinges on the interpretability of the results. In this work, the interpretability of clusters is measured based on the homogeneity of nodes in each cluster, with respect to certain predefined feature values of interest (FoI) in the data to the enduser.
Solution quality of the clusters, denoted by the objective value, and interpretability are often competing objectives. For example in Figure 1(b), interpretability is optimized in isolation by partitioning the nodes only based on FoI, which significantly affects the solution quality and optimizing for solution quality affects interpretability (Figure 1(a)). Reliable decision support requires interpretable clusters, without significantly compromising the solution quality.
In this paper, we study the problem of optimizing for interpretability of clusters, in addition to optimizing the solution quality of centroidbased clustering algorithms such as kcenter. We propose a interpretable clustering algorithm that generates clusters such that at least fraction of nodes in each cluster share the same feature value, with respect to FoI. The value is a userspecified input. By adjusting the value of , the homogeneity of the nodes in the cluster with respect to FoI can be altered, thus facilitating balancing the tradeoff between solution quality and interpretability (Figure 1(c)). We then present a more efficient algorithm to specifically handle settings with and bound the loss in solution quality of centroidbased clustering objectives, when optimizing for interpretability.
While interpretable clusters are a minimal requirement, it may not be sufficient to guarantee interpretability of the system, due to the cognitive overload for users in understanding the results. Hence, the resulting clusters are complemented by logical combinations of cluster labels as explanations. The feature values of the nodes in the cluster, with respect to FoI, are generated as cluster labels, using frequent pattern mining. In Figure 1(d), traditional clustering produces longer explanations, which are generally undesirable (DoshiVelez and Kim, 2017), and optimizing for interpretability produces concise explanations. Thus, generating interpretable clusters is crucial for generating concise and useful explanations.
Our primary contributions are: (i) formalizing the problem of interpretable clustering that optimizes for interpretability, in addition to solution quality (Section 2); (ii) presenting two algorithms to achieve interpretable clustering and analyzing their theoretical guarantees (Section 3); and (iii) empirical evaluation of our approaches using four realworld datasets and using frequent pattern mining to generate cluster explanations (Section 4). Our experiments demonstrate the efficiency of our approaches in balancing the tradeoff between interpretability and solution quality. The results also show that clusters with different levels of interpretability can be generated by varying .
2. Problem Formulation
Let denote a set of nodes, along with a pairwise distance metric . Let denote the set of values of features where refers to the set of values for the ith feature, and denote the mapping from nodes to the feature values. Let be a graph where is a metric over . Given a graph instance and an integer , the goal is to partition into disjoint subsets by optimizing an objective function, which results in clusters . The objective function (), for a graph and a set of clusters , returns an objective value as a real number, , which helps compare different clustering techniques. The optimal objective value of an objective function is denoted by . denotes the cluster to which the node is assigned.
The clusters produced by the existing algorithms are often nontrivial and nonintuitive to understand for an enduser due to the complex feature space. Let denote the set of features in that signify interpretability for the user, denoting the feature values of interest (FoI). In Figure 1, is the sanitation feature and {025%, 2550%, 5075%, 75100%}, denoting the four feature values of access to basic sanitation.
Quantifying Interpretability: Interpretability score of a cluster with respect to a feature value is denoted by and estimated based on the fraction of the nodes in the cluster that share the feature value, :
with denoting whether the node satisfies feature value and denoting the total number of nodes in the cluster. Hence, . Given , the interpretability score of a cluster, , is calculated as
Definition 2.1 ().
The interpretability score of a clustering , given , is denoted by and is calculated as:
Problem Statement: Given , we aim to create clusters that maximize the interpretability score, , while simultaneously optimizing for solution quality using centroidbased clustering objectives such as kcenter. kcenter clustering aims to identify k nodes as centers (say , ) and assign each node to the closest cluster center ensuring that the maximum distance of any node from its cluster center is minimized. The objective value is calculated as:
Definition 2.2 ().
A clustering is interpretable, given , if . That is, each cluster is composed of at least fraction of nodes that share the same feature value.
Definition 2.3 ().
A clustering is strongly interpretable, given , if .
We now analyze the maximum achievable interpretability for a given dataset and identify the upper bound on .
2.1. Optimal upper bound of
Let denote the optimal upper bound of . Without loss of generality, given a feature value , we assume that there exists at least one node that satisfies . When , with denoting the number of clusters, a clustering can be generated such that . This is achieved by constructing each cluster with the nodes that satisfy the same feature value , and hence .
However, when , there exists no clustering with , since the optimal solution cannot form clusters with nodes satisfying only one feature value of interest. Hence, in such cases, . The optimal value for this case can be estimated as follows: consider the topk features based on frequency of occurrence in the data and assign the nodes that refer to each of these features to a different clusters. All the remaining unassigned nodes are then iteratively assigned to the cluster with maximum interpretability score. If multiple clusters have the same interpretability score, the new node is added to the cluster with larger size, since it is less likely to negatively affect the interpretability score.
In general, the interpretability score of a cluster is dominated by the feature value satisfied by maximum number of nodes within . For a given cluster , interpretability can be boosted by either adding more nodes of the majority feature or removing the nodes that are different from the majority feature. If all the nodes that do not represent the majority feature are removed, the interpretability score of is 1. Using this intuition, we propose algorithms to generate interpretable cluster, when .
3. Solution Approach
In many applications, the clusters that are considerably homogeneous but not strongly interpretable may still be acceptable since a few outliers do not affect the decision maker’s abilities to infer a pattern. For example, if the nodes in a cluster are 90% homogeneous, the interpretability may not be significantly affected. However, this may help with improving the solution quality of the clusters formed using centroidbased objectives. To that end, we propose an algorithm (Algorithm 1) in which the homogeneity of the nodes in a cluster can be adjusted using a tunable parameter . The algorithm identifies interpretable clusters for all values of . We present the algorithms using kcenter as the clustering objective. However, it is straightforward to extend the algorithms to any other centroidbased clustering.
The input to Algorithm 1 is a graph , the parameters and , referring to the number of clusters needed and the interpretability score requirement. First, it initializes a collection of clusters, with the greedy kcenter algorithm and optimizes the quality of clusters generated. In order to improve interpretability score, our algorithm iteratively identifies a cluster with the least interpretability score and then postprocesses it to improve its interpretability scores without considerable loss in the kcenter objective. While processing , a feature value associated with maximum number of nodes in is identified as the ‘majority’ feature value along with a set corresponding to the collection of nodes that share the majority value. To boost the score of , the fraction of nodes that share the majority feature needs to be increased. We employ the following two operations for this purpose:

The total number of nodes with majority feature are increased (boost_majority); and

The nodes that do not correspond to the majority feature value in are removed from and reassigned to other clusters (reduce_minority).
boost_majority. Outlined in Algorithm 2, this subroutine iterates over the clusters to identify the closest cluster that contains the nodes with the ‘majority feature’ and merges with (Line 1,2). It then identifies two different features that have the maximum frequency within the merged cluster and assigns these features to two different clusters and (Line 2). The remaining nodes in the merged cluster are assigned to either of the two clusters such that and have comparable interpretability scores (Line 4,5).
reduce_minority. This subroutine, outlined in Algorithm 3, identifies the collection of nodes within that do not have the ‘majority’ feature, which when removed help boost the interpretability score of (Line 1). Nodes which do not belong to the majority feature and are farthest from the center are considered for reassignment (Lines 2,3). Each of farthest node is then assigned to clusters , considered in increasing order of distance from such that the interpretability score of does not reduce below (Line 4). This process of removing nodes from is performed only when has the maximum number of nodes present in the data set that share the majority feature.
In some cases, Algorithm 1 may converge to a local maxima and may not reach , when the input . This happens when the feature value being boosted is not one of the feature values in the optimal solution. However, we observe that this is a rare scenario in practice. A detailed algorithm that works in these cases is described in the Appendix.
For cases in which the minimum distance pair identified in Algorithm 2 belong to same optimal cluster, we bound the loss in kcenter objective when using boost_majority.
Lemma 3.1 ().
In each iteration of boost_majority where the minimum distance pair identified in Algorithm 2 belong to same optimal cluster,, the kcenter objective value worsens by , where and denotes the optimal kcenter objective value of the clusters that achieve maximum interpretability.
Proof in Appendix.
When generating clusters with , Algorithm 1 may take long to converge, especially if the initial kcenter based clusters have poor interpretability. We propose a more efficient algorithm for strong interpretability that solves the interpretable clustering problem on each individual features to construct the final solution.
3.1. Strong interpretability,
Algorithm 4 is a more efficient approach to handle scenarios with . At a highlevel, it identifies the distribution of feature values among clusters and then quickly generates the clusters. It leverages the property that a clustering with is characterized by clusters such that all nodes in a cluster share the same feature value. As discussed earlier, is achievable only when and this is an important assumption required for this algorithm.
The first step is to identify a set which consists of a tuple of values that sum up to (Line 2). This set identifies all possible distributions of the different feature values under consideration for interpretability (FoI) among the clusters. For each value , it identifies clusters for nodes with feature . The collection of these clusters refer to the solution corresponding (Lines 35). This step generates collection of kclusters and the one with minimum kcenter objective value is chosen as the final set of clusters (Line 6).
Algorithm 4 is capable of generating clusters with high interpretability, without significant loss in the clustering objective value. We now show that the final solution returned by our algorithm is a 2approximation of the optimal algorithm that generates interpretable clusters and optimizes for the kcenter objective.
Lemma 3.2 ().
The stronginterpretability clustering algorithm generates such that and , where refers to the kcenter of objective of and denotes the optimal kcenter objective value of clusters that achieve maximum interpretability.
Proof Sketch.
Since each cluster contains all nodes that share the same feature value, . Additionally, the optimal solution has a distribution of features . The solution is a 2approximation of (following the proof of 2approximation of greedy algorithm for kcenter). Since, the final solution chooses that minimizes the kcenter objective over all possible clustering in , it is guaranteed that is a 2approximation of . ∎
4. Experimental Results
We evaluate the efficiency of our approaches based on two metrics: interpretability score of the clustering and the objective value of kcenter algorithm. We refer to Algorithm 1 as IC and Algorithm 4 as .
Baselines The results are compared with that of three baselines: 1. kcenter clustering over all the features in the data (); 2. partitioning the dataset into k clusters based on the FoI (); and 3. kcenter clustering over only the features of interest for interpretability (). and represent extremes of the spectrum, optimizing only for kcenter objective or interpretability. aims to optimize for the distances, ensuring that the nodes with similar features are present close to each other.
Datasets The algorithms are evaluated using four datasets: 1. Kenya sanitation data in which the interpretability is defined over % population in a district with access to basic sanitation; 2. Kenya traffic accidents data
Setup All algorithms are implemented in Python and tested on an Intel i7 computer with 8GB of RAM. In the interest of clarity, we experiment with for all domains. Due to randomness in the kcenter algorithm, the clustering objective behavior of our techniques may not be monotonic. For any given , we run the algorithm for different values and choose the best clustering returned.
4.1. Solution quality vs. cluster interpretability
We first study the tradeoff between the kcenter objective value and and the interpretability score of the clusters. We vary for Algorithm 1 and compare the results with that of the baselines and that of Algorithm 4, with fixed . The results in Figure 2 show how the kcenter objective value may be affected as we form increasingly interpretable clusters using our algorithms. We do not distinguish between the performances with various values, denoted by the purple markers, since the goal is to understand how the algorithm balances the tradeoff for any value. We also do not consider since that defeats the purpose of optimizing for interpretability. Note that our algorithm supports any value of as input.
Approaches that minimize kcenter objective and maximize the interpretability, lower right corner of the figure, are desirable. Overall, the baselines either achieve high interpretability with poor kcenter objective or low kcenter objective with a very low interpretability. Our approach has a better balance between them since the clusters generated by our algorithm have high interpretability, without significant loss in kcenter objective. With the increase in values, the kcenter objective worsens but the loss in kcenter objective is not high and is within a factor of 5 in most cases. The runtime of is at most 40 seconds across all datasets and the runtime of our approach is at most 65 seconds across all datasets and all values of . This shows that there is no significant overhead in optimizing for both interpretability and solution quality.
4.2. Effect of varying
As discussed above, it is evident that our approaches efficiently balance the tradeoff even for higher values of . We now study the effects of varying on the cluster composition. Figure 3 shows the distribution of FoI in each cluster for different values of for the Adult dataset. In the interest of readability, we do not include results for lower values of . With the increase in , the fraction of majority feature in each cluster grows. For example, the nodes represented by yellow color are merged as is increased from to . Similarly, when is increased from to , the green colored feature is a minority, which are merged to form a new cluster. Notice that all the green feature nodes are not merged and this process stops as soon as the clusters reach interpretability of . However, in the case of stronginterpretability with , the clusters are homogeneous. In our experiments, the runtime with is at most twice as that of and the runtime of is consistently lower than IC for . Similar trends were observed for other domains and the results are included in the Appendix.
4.3. Effect of varying
To ensure that the trend in the relative performances of the approaches in minimizing the kcenter objective are consistent, we experiment with varying the number of clusters , and with fixed . Figure 4 plots the results of the approaches for varying from 10 to 50. As expected, the kcenter objective value decreases with the increase in and the relative behavior of all the techniques is consistent. Our techniques IC and IKC are close to KC across all datasets, while works well for accidents and sanitation datasets only. All techniques run in less than three hours for all datasets and for all values of in the experiments, demonstrating the scalability of our approach.
4.4. Interpretability and Explanation Generation
The interpretability of the resulting clusters can be further improved by generating explanations based on the feature values of the nodes in the clusters. Concise and correct explanations based on FoI are possible only when the clusters are homogeneous with respect to FoI. Hence, generating explanations also allows us to understand and compare the performance of different techniques beyond the interpretability scores.
We generate explanations as logical combinations of the feature values of FoI associated with the nodes in each cluster, using frequent pattern mining (Han et al., 2007). This is implemented using Python pyfpgrowth package with a minimum support value of of the cluster size. That is, this approach lists all the feature values that are associated with at least of the nodes in the cluster and is the tolerance for outliers in the cluster. This value can be adjusted depending on the application. Explanations are then generated by a logical OR over these feature values. Figure 5 shows the distribution of explanations across clusters for different techniques on the Adult data, with . Clusters generated by the approach contain a skewed distribution of features across all clusters and are hard to interpret, with respect to FoI. Approaches that focus on interpretability have generated homogeneous clusters with majority of the nodes in a cluster sharing the same feature value. As a result, the generated explanations for these approaches are concise and fairly different across the clusters, thereby improving the interpretability for the decision maker.
5. Related Work
Interpretable machine learning The two main threads of research in interpretable machine learning are generating explanations for blackbox models (Abdul et al., 2018; Holzinger, 2018; Gunning, 2017; Guidotti et al., 2019; Lakkaraju et al., 2019) and improving the transparency with interpretable models (DoshiVelez and Kim, 2017; Rudin, 2019; Chen et al., 2018). Most of these approaches have been developed for predictive models or for interpretable neural networks and have heavily relied on domaindependent notions of interpretability (DoshiVelez and Kim, 2017). We define a domainindependent notion of interpretability and aim to form interpretable clusters, which is critical for highimpact applications (Rudin, 2019). We argue that generating explanations for clustering requires homogeneous clusters and propose algorithms that improve the interpretability without compromising on the solution quality.
Clustering with multiple objectives Prior research on clustering focuses heavily on improving the performance metrics (Aggarwal and Wang, 2010; Jain et al., 1999; Xu and Wunsch, 2005), such as accuracy, scalability and runtime, but neglect the interpretability aspect. Another thread of work employs soft clustering methods (Chen et al., 2016; Greene and Cunningham, 2005) or mixed integer optimization (Bertsimas et al., 2018) to improve interpretability but do not provide any solution guarantees. Constrained clustering (Wagstaff et al., 2001), in which the pairs of nodes that must belong to the same cluster are enforced as constraints, cannot be used to generate interpretable clusters when . Another related body of work is the research on multiobjective clustering (Chierichetti et al., 2017; Law et al., 2004; Jiamthapthaksin et al., 2009; Handl and Knowles, 2007; Bera et al., 2019) that has been predominantly applied for specific applications and recently for improving fairness. Extending these approaches to our setting is not straightforward since the algorithms are problemspecific. There is limited research on interpretable clustering (Chen et al., 2016) since clusters are expected to be interpretable as they group similar nodes, which is not necessarily the case when dealing with highdimensional data.
6. Conclusion
We address the challenge of generating interpretable clustering, while simultaneously optimizing for solution quality of the resulting clusters. We propose an algorithm to generate interpretable clusters, given and the features of interest that signify interpretability to the user. A more efficient algorithm specifically to handle scenarios with is also presented, along with the theoretical guarantees of the two approaches. Our approaches efficiently balance the tradeoff between interpretability and solution quality, compared to the baselines. The proposed approach can be extended to handle continuous FoI by treating each interval of continuous values as a discrete value for interpretability.
We currently target settings in which clustering is performed using centroidbased algorithms. In the future, we aim to expand the range of clustering objectives considered, including hierarchical clustering, and analyze their theoretical guarantees. Using interpretable clustering to identify bias in decisionmaking is another interesting direction for future research.
7. Appendix
7.1. Proof of Lemma 5
Proof.
In a particular iteration, let the clusters identified to be merged in boost_majority be and with such that and be the edge with minimum distance. Since are present in the same optimal cluster, then , where is the kcenter objective for the optimal clustering that optimizes for IC and kcenter. Hence . The final clusters constructed have nodes from and . The maximum pairwise distance between any pair of points such that and can be evaluated using triangle inequality. We use the property that the maximum distance between any pair of points within same cluster of radius is .
This shows that the pairwise distance between any pair of points within is less than . Hence, the final cluster output by boost_majority will be 10approximation of the optimal solution in the worst case. ∎
7.2. Remark 4
As discussed in the main paper, Algorithm 1 (in the paper) may converge at a local maxima and never achieve . Even though this is a rare scenario, we present Algorithm 5 as a different subroutine which can be run along with boost_majority and reduce_minority subroutines to modify the clusters and identify clustering with higher interpretability. This subroutine first identifies a feature which is present as a majority feature in one of the optimal clusters but is not present as a majority in any of the clusters (Line 2). Since this feature needs to be a majority, we identify a cluster that is closest to and contains nodes of feature . All nodes with feature in are added to and then the feature nodes are boosted to become majority by calling boost_majority subroutine ensuring that it boosts nodes that belong to feature .
7.3. Additional experiments
Footnotes
 journalyear: 2020
 copyright: acmcopyright
 conference: 2020 AAAI/ACM Conference on AI, Ethics, and Society; February 7–8, 2020; New York, NY, USA
 booktitle: 2020 AAAI/ACM Conference on AI, Ethics, and Society (AIES’20), February 7–8, 2020, New York, NY, USA
 price: 15.00
 doi: 10.1145/XXXXXX.XXXXXX
 isbn: 9781450371100/20/02
 https://www.opendata.go.ke/datasets/2011trafficincidencesfromdesinventar
 https://archive.ics.uci.edu/ml/machinelearningdatabases/adult/
 http://archive.ics.uci.edu/ml/datasets/communities+and+crime
References
 Trends and trajectories for explainable, accountable and intelligible systems: an HCI research agenda. In Proceedings of the CHI conference on human factors in computing systems, pp. 582. Cited by: §5.
 A survey of clustering algorithms for graph data. In Managing and mining graph data, pp. 275–301. Cited by: §5.
 Criminal pattern identification based on modified kmeans clustering. In IEEE International Conference on Machine Learning and Cybernetics (ICMLC), Vol. 2, pp. 799–806. Cited by: §1.
 Kenya sanitation by district. Note: \urlhttps://www.opendata.go.ke/datasets/sanitationbydistrict Cited by: §1.
 Fair algorithms for clustering. arXiv preprint arXiv:1901.02393. Cited by: §5.
 Interpretable clustering via optimal trees. arXiv preprint arXiv:1812.00539. Cited by: §1, §5.
 This looks like that: deep learning for interpretable image recognition. arXiv preprint arXiv:1806.10574. Cited by: §5.
 Interpretable clustering via discriminative rectangle mixture model. In IEEE 16th International Conference on Data Mining (ICDM), pp. 823–828. Cited by: §1, §5.
 Fair clustering through fairlets. In Advances in Neural Information Processing Systems, Cited by: §5.
 Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608. Cited by: §1, §1, §5.
 Producing accurate interpretable clusters from highdimensional data. In European Conference on Principles of Data Mining and Knowledge Discovery, Cited by: §5.
 A survey of methods for explaining black box models. ACM computing surveys 51 (5), pp. 93. Cited by: §5.
 Explainable artificial intelligence. Defense Advanced Research Projects Agency (DARPA). Cited by: §5.
 Frequent pattern mining: current status and future directions. Data Mining and Knowledge Discovery 15, pp. 55–86. Cited by: §4.4.
 An evolutionary approach to multiobjective clustering. IEEE Transactions on Evolutionary Computation 11 (1), pp. 56–76. Cited by: §5.
 An enhanced kmeans clustering algorithm for pattern discovery in healthcare data. International Journal of Distributed Sensor Networks. Cited by: §1.
 From machine learning to explainable AI. In IEEE World Symposium on Digital Intelligence for Systems and Machines, pp. 55–66. Cited by: §5.
 The next silicon valley? on the relationship between geographical clustering and public policy. International Entrepreneurship and Management Journal 5 (3), pp. 285–299. Cited by: §1.
 Kenya Vision 2030: A Globally Competitive and Prosperous Kenya . Note: \urlhttps://www.opendata.go.ke/datasets/vision2030 Cited by: §1.
 Data clustering: a review. ACM Computing Surveys 31 (3), pp. 264–323. Cited by: §1, §5.
 A framework for multiobjective clustering and its application to colocation mining. In Proceedings of the International Conference on Advanced Data Mining and Applications, Cited by: §5.
 Faithful and customizable explanations of black box models. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, Cited by: §1, §5.
 Multiobjective data clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Cited by: §5.
 Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence 1 (5), pp. 206. Cited by: §1, §5.
 Constrained kmeans clustering with background knowledge. In International Conference on Machine Learning, Cited by: §5.
 Survey of clustering algorithms. IEEE Transactions on Neural Networks 16, pp. 645–678. Cited by: §5.