Mapper on Graphs for Network Visualization

Mapper on Graphs for Network Visualization


Networks are an exceedingly popular type of data for representing relationships between individuals, businesses, proteins, brain regions, telecommunication endpoints, etc. Network or graph visualization provides an intuitive way to explore the node-link structures of network data for instant sense-making. However, naive node-link diagrams can fail to convey insights regarding network structures, even for moderately sized data of a few hundred nodes. We propose to apply the mapper construction—a popular tool in topological data analysis—to graph visualization, which provides a strong theoretical basis for summarizing network data while preserving their core structures. We develop a variation of the mapper construction targeting weighted, undirected graphs, called mapper on graphs, which generates property-preserving summaries of graphs. We provide a software tool that enables interactive explorations of such summaries and demonstrates the effectiveness of our method for synthetic and real-world data. The mapper on graphs approach we propose represents a new class of techniques that leverages tools from topological data analysis in addressing challenges in graph visualization.

1 Introduction

Networks are often used to model social, biological, and technological systems. In recent years, our ability to collect and archive such data has far outpaced our ability to understand them. For instance, the Blue Brain Project—the world’s largest-scale simulations of neural circuits—generates instances of the micro-connectome containing 10 million neurons and 88 billion synaptic connections for the rodent brain. The challenges for graph visualization (sometimes called network visualization) are two-fold: how to effectively extract features from such complex data; and how to design effective visualizations to communicate these features to the users.

We propose to address these challenges by leveraging the mapper construction [88], a tool in topological data analysis (TDA), to develop visualizations for large network data. Given a topological space equipped with a function on , the classic mapper construction from the seminal work of Singh et al. [88] provides a topological summary of the data for efficient computation, manipulation, and exploration. It has enjoyed tremendous success in data science, from cancer research [72] to sports analytics [1], among others [17, 64, 65, 91]; it is also a cornerstone of several data analytics companies, e.g., Ayasdi and Alpine Data Labs.

In this paper, we develop a variation of the mapper construction targeting weighted undirected graphs, called mapper on graphs. For the rest of the paper, we use networks to refer to the data and graphs as an abstraction to the data. The mapper construction connects naturally with visualization by providing a strong theoretical basis for simplifying large complex data while preserving their core structures. Specifically:

  • We propose a set of summarization techniques to transform large graphs into hierarchical representations and provide interactive visualizations for their exploration.

  • We demonstrate the effectiveness of our method on synthetic and real-world data using three different topological lenses that capture various properties of the graphs.

  • We provide open-sourced implementation together with our experimental datasets via GitHub (see the supplement material).

2 Related Work

Graph visualization. We limit our review to node-link diagrams, which are utilized by many visualization software tools, including Gephi [7], GraphViz [34], and NodeXL [44]. For a comprehensive overview of graph visualization techniques, see [93].

One of the biggest challenges with node-link diagrams is visual clutter, which has been extensively studied in graph visualization [33]. It is mainly addressed in three ways: improved node layouts, edges bundling, and alternative visual representations.

Tutte [92] provided the earliest graph layout method for node-link diagrams, followed by methods driven by linear programming [40], force-directed embeddings [37, 47], embeddings of the graph metric [39], and connectivity structures [13, 50, 52, 53]. TopoLayout [2] creates a hybrid layout by decomposing a graph into subgraphs based on their topological features, including trees, complete graphs, bi-connected components, and clusters, which are subsequently grouped and laid out as meta-nodes. One of many differences between TopoLayout and our work is that we use functions defined on the graph to automatically and interactively guide decomposition and feature extraction among subgraphs.

Edge bundling, which bundles adjacent edges together, is commonly used to reduce visual clutter on dense graphs [45]. For massive graphs, hierarchical edge bundling scales to millions of edges [38], while divided edge bundling [85] tends to produce higher-quality visual results. Nevertheless, these approaches only deal with edge clutter, not node clutter, and they only support limited types of analytic tasks [4, 67].

Finally, alternative visual representations have been used to remove clutter, ranging from variations on node-link diagrams, such as replacing nodes with modules [31] and motifs [30], to abstract representations, such as matrix diagrams [28] and graph statistics [49].

Node clustering. The objective in node clustering (or graph clustering) is to group the nodes of the graph by taking into consideration its edge structure [84]. Common techniques include spectral methods [26, 36, 54, 94], similarity-based aggregation [90], community detection [69, 70], random walks [48, 80], and hierarchical clustering [12, 16]. Edge clustering has also been studied [23, 35]. Broadly speaking, our approach is a type of graph clustering that simultaneously preserves relationships between clusters.

TDA in graph analysis and visualization. Persistent homology (the study of topological features across multi-scales) and mapper construction are two of the most widely used tools in TDA. A number of works use persistent homology to analyze graphs [29, 32, 46, 76, 77], and it has been applied to the study of collaboration networks [5, 20] and brain networks [21, 24, 56, 57, 58, 59, 79]. In terms of graph visualization, persistent homology has been used in capturing changes in time-varying graphs [43], as well as supporting interactive force-directed layouts [89].

The mapper construction [88] has been widely utilized in TDA for a number of applications [17, 64, 65, 73, 91]. Recently, it has witnessed major theoretical developments (e.g., [19, 25, 68]) that further adjudicate its use in data analysis. To the best of our knowledge, this is the first time the mapper construction is utilized explicitly in graph visualization.

Figure 1: An illustration of the mapper on graphs construction: (a) A weighted graph has (b) a topological lens applied. (c) A cover of the range space is given by intervals , , , and as cover elements. (d-e) The connected subgraphs induced by form a cover of , denoted as . (f) The -dimensional skeleton of the nerve (-nerve) of is the resulting mapper on a graph whose nodes represent the connected subgraphs (in orange), and edges represent the non-empty intersections between the subgraphs (in purple).

3 Methods: Mapper on Graphs Construction

Suppose the data is a weighted graph equipped with a positive edge weight and a real-valued function defined on its nodes . Our mapper on graphs method—a variation of the classic mapper construction—provides a general framework to analyze, simplify, and visualize , as well as functions on .

An open cover of a topological space is a collection of open sets for some indexing set such that . A finite open cover is a good cover if every finite nonempty intersection of sets in is contractible.

The mapper on graphs construction starts with a finite good cover of the image of , such that . Let denote the cover of obtained by considering the connected components (i.e., maximal connected subgraphs) induced by nodes in for each .

Given a cover of , let denote the simplicial complex that corresponds to the nerve of the cover , that is, . We compute the nerve of , denoted as , and refer to its -dimensional skeleton as the mapper on a graph, denoted as ; see Figure 1 for an illustrative example.

Parameters. Mapper on graphs is inherently multi-scale; its construction relies on two sets of parameters: the first defines the function/lens , and the other specifies the cover . For simplicity, we normalize the range space to be within .

  1. Topological Lens: The function  plays the role of a topological lens through which we look at the properties of the data, and different lenses provide different insights [9, 88]. Mapper on graphs currently considers three graph-theoretic lenses, average geodesic distance (AGD), density estimation, and eigenfunctions of the graph Laplacian (see Figure 2), although our framework can be easily extended to include other lenses (see Section 6).

  2. Cover: The range of , , is covered by , which consists of a finite number of open intervals as cover elements . A common strategy is to use uniformly sized overlapping intervals. Let be the number of intervals, and describes the amount of overlap between adjacent intervals (see Section 3.2 for details). Adjusting these parameters increases or decreases the amount of aggregation mapper on graphs provides.

Figure 2: Examples of topological lenses for graphs: (a) average geodesic distance (orange); (b) density estimation with (green); and (c) eigenvectors of the Fiedler vector of the graph Laplacian, (purple). Darker colors mean lower function values.

3.1 Topological Lens

An interesting open problem for the classic mapper construction is how to formulate topological lenses beyond the best practice or a rule of thumb [9, 10]. In practice, height functions, distances from the barycenter of the space, surface curvature, integral geodesic distances, and geodesic distances from a source point in the space have all been proposed as reasonable choices [9]. In the graph setting, we focus on graph-theoretic lenses defined on the nodes of a graph, as illustrated in Figure 2. Each lens is chosen to reflect a specific property of interest that is intrinsic to the structure of a graph. In particular, we use as lenses average geodesic distance (AGD) [51] that detects symmetries in the graph while being invariant to reflection, rotation, and scaling; density estimation [87] that differentiates dense regions from sparse regions and outliers; and eigenfunctions of the graph Laplacian [55] that capture geometric properties of the graph.

Average geodesic distance. Suppose a weighted graph is equipped with a geodesic distance metric . That is, measures the geodesic/graph distance between two nodes . can be computed by utilizing Dijkstra’s shortest path algorithm. The average geodesic distance, , is given by

This definition implies that the nodes near the center of the graph will likely have low function values, while points on the periphery will have high values. The function has been used extensively in shape analysis due to its desirable proprieties in detecting and reflecting symmetry [51] based on how the function values are distributed. Therefore, the as a topological lens captures the symmetric properties of a graph, which are described by all or parts of the graph that are invariant to transformations such as reflection, rotation, and scaling.

Figure 3: The effect of a lens. The original graph, colored by one of the three lenses, is shown on the left; its corresponding mapper on a graph , along with a chosen cover, is shown on the right. (a) . (b)  with . (c) .

The mathematical notion of automorphism, in some sense, captures the symmetry of the space as it is a structural-preserving way of mapping a space to itself. More precisely, consider a graph as a metric space equipped with the geodesic distance, . A bijection is called an automorphism on if for every . Let denote the group of automorphisms on . A function is isometry invariant over if for every : . is, therefore, an isometry invariant scalar function. Indeed, let be an automorphism on , then for every , we can verify that , and . See Figure 2(a) and Figure 3(a) for examples of on graphs.

Density estimation. The density estimation function [87] is given by

where is the geodesic distance between two nodes in the graph and .

Since is completely defined in terms of the distance , it is not hard to see that is also isometry invariant. correlates negatively with as it tends to take larger values on nodes that are close to the center, see Figure 2(b) and Figure 3(b) for examples.

Eigenfunctions of the graph Laplacian. Let be the vector space of all functions . The unnormalized Laplacian of the graph is the linear operator defined by mapping to , where

The eigenvectors of form a rich family of scalar functions defined on with many geometric properties [55]. First, the gradient of the eigenfunctions of tends to follow the overall shape of the data [63]; and these functions have been used in applications, such as graph understanding [86], segmentation [82], spectral clustering [71], and min-cut problems [66]. Sorting the eigenvectors of by increasing eigenvalues, we use eigenvectors of the second and third smallest eigenvalues of as the lens, denoted as and .

These vectors usually contain low-frequency information about the graph, and they help to retain the shape of complex graphs. In particular, , commonly referred to as the Fiedler vector [63], has desirable geometric properties [27]. For instance, the maximum and the minimum of the Fielder vector tend to occur at nodes with maximum geodesic distances [22], see Figure 2(c) and Figure 3(c) for examples.

Furthermore, there is a connection between mapper on graphs, spectral clustering, and graph min-cut. For instance, the Fielder’s vector can be used to bi-partition the graph (i.e., based on or ); such a partition could also be approximated by computing mapper on a graph with as the lens and setting and . not only provides a generalization of spectral clustering but also preserves the connections (which form the min-cut) between the clusters (for appropriately chosen ).

3.2 Cover

Figure 4: Varying and in a cover. (a) with a lens. Various mapper on graphs constructions: (b-d) , and , respectively; . (e-f) , and , respectively.

Given a graph equipped with a lens function , suppose the range of the function is normalized to be within , and for a finite good cover of the interval . We represent this cover visually by drawing long, colored rectangular boxes, as indicated in 4.

The mapper on graphs construction relies on the choice of a cover for the interval ; such a choice is rather flexible but also essential to achieve effective graph visualization. To obtain an initial cover, we start by splitting into (the resolution parameter) intervals with equal length, such that and . The overlap parameter is then used to obtain the initial cover consisting of cover elements for .

Choosing and has a significant impact on the mapper on graphs output, as illustrated in Figure 4. Broadly speaking, smaller leads to a smaller topological summary of the graph; and smaller captures fewer connections between clusters of nodes. In the examples shown in the paper, we find a smaller typically give a more effective visualization for large and highly connected graphs, e.g., , while appears to be sufficient for small graphs.

4 Visual Design and Interaction

We provide a linked-view interface to enable exploration of the structure of a graph. It connects the original graph to its (multi-scale) summary in the form of a mapper on a graph , through interactive cover manipulation that supports customization of .

4.1 Cover Visualization and Interaction

The mapper on graphs construction relies on the choice of two sets of parameters: a lens and a cover. Therefore, the cover visualization consists of two components: a histogram of the lens, showing the distribution of function values, and an interactive cover designer. Via interactive visualization, we treat the exploration and manipulation of these parameters as a vehicle to study and summarize the intrinsic structure of an input graph.

Histogram of a lens. Understanding the distribution of the values of a lens can be helpful in the mapper on graphs construction. Figure 5(left) shows an example of a histogram for the lens in gray. The histogram is split into a fixed number of bins within the range . We will illustrate later how the visual information encoded in the histogram can be utilized to optimize the choice of the cover. In addition, the histogram of a lens can be used to inform the choices for and —generally speaking, a uniformly distributed lens function requires smaller .

Figure 5: The histogram of a lens. consists of open intervals , and .

Cover visualization and interactive manipulation. The cover is visualized using a series of boxes, one per cover element, displayed next to the histogram of the lens. Each box is placed based upon the start and end values of its interval and colored based upon the midpoint. Figure 5 shows the cover as red and orange boxes.

While an initial, uniform cover is sufficient for most graph visualizations, we provide interactive manipulation of individual cover elements that can be used to obtain more desirable mapper on graphs output in some cases. Given an interval , the user can manipulate its endpoints dynamically via an interactive interface such that can be shrunk, expanded, or shifted, as illustrated in Figure 6. As a user manipulates the interval , the connected components within split, merge, appear, or disappear. The histogram of a lens can be used to inform the cover manipulation; for instance, the length of an interval could be inversely proportional to the density of the histogram.

Figure 6: (a) The case when an interval shrinks, from left to right, or equivalently when it expands, from right to left, are shown. (b) The effect of interval shrinking (expanding) on the mapper on graphs nodes is shown. (c) The case when the interval shifts to a new position is shown. (d) Changes to the mapper on graphs nodes that occur due to the interval shift are shown.

4.2 Graph Drawing

For both and its summary , we apply a Fruchterman-Reingold force-directed layout [37] with the Barnes-Hut approximation for repulsive forces [6]. Our approach is ultimately agnostic of the graph layout algorithm, and different layouts (e.g., layered approaches) may improve the presentation of certain graphs.

For a given lens, , a node in is colored by a saturated colormap (red for , green for , and purple for or ) based on its function value. A node in (associated with a connected component ) is colored similarly by taking the average of the function values of nodes in . The size of is proportional to .

For both graphs, edge thickness is drawn proportional to edge weight. For , an edge represents the nonempty intersection between and . Therefore, its edge weight, and thus thickness, is drawn proportional to the size of the intersection, .

For readability, only the largest component of the mapper on a graph is included in the visualization. Furthermore, mapper on a graph nodes are removed from the output if the size of their connected component is less than a user-selected value.

4.3 Interactive Structural Correspondences

We provide three mechanisms for exploring structural correspondences between and : cover element selection, node selection, and edge selection.

Figure 7: (a-c) Selecting a cover element (in blue) triggers the selection of its corresponding nodes in (top) and (bottom). (d-e) Selecting a node of (top right in blue) triggers the selection of its corresponding cover element that generates (left) and nodes in (bottom).
Figure 8: Selecting an edge (blue) in highlights the clusters of nodes in associated with its endpoints and their intersection.

Cover element selection. When a cover element is selected, the action triggers the selection of nodes in , as well as nodes that represent connected components of in . As illustrated in Figure 7(a-c), after the selection of a cover element (top left in (a-c), highlighted in blue), our system selects the corresponding nodes in (top) and in (bottom). As previously noted, if the nodes of captured by a particular cover element need fine-tuning, the box may be dragged, expanded, or contracted. will update correspondingly.

Node Selection. Each node in corresponds to a connected component from the original graph . With the selection of a node , our interface recovers and highlights from , as well as the cover element that generates the node , see Figure 7(d-e).

Edge Selection. Each edge in is determined by two connected components and in . With the selection of an edge , our interface highlights the clusters and in associated with its endpoints. Specifically, the sets , , and are colored differently to highlight node memberships and the relationship between the clusters. Figure 8 illustrates this process. Nodes that are unique to each endpoint are in purple and sky blue, respectively; nodes that correspond to the intersection are in blue (see Figure 8(a)). For comparison, clusters of nodes attached to individual endpoints are also highlighted in blue in Figure 8(b).

5 Results

To demonstrate our approach, we have implemented our approach using Java and Processing4. We evaluate our approach by examining mapper on graphs on synthetic and real datasets. Our code and datasets are available on GitHub5.

Figure 9: Mapper on graphs applied to the visualization of synthetic datasets. In all examples, is shown on the top with its cover and the original graph is shown on the bottom. (a) Connected caveman graph (, ); (b) Lobster graph (, ); (c) Dorogovtsev-Goltsev-Mendes graph (, , ); (d) Large bipartite graph (, ); (e) Community graph (, ); (f) Torus graph (, ); (g) Dorogovtsev-Goltsev-Mendes graph (, ); (h) Lollipop graph (, , ); (i) Small bipartite graph (, ); (j) Grid graph (, ); (k) Tree (, ); (l) Ladder graph (, ).
Figure 10: Edge selection within for the USAIR 97 graph: selecting edge in (a) vs. edge in (b).

Synthetic datasets. We apply mapper on graphs to synthetic datasets, all of which are generated using NetworkX [42]. Figure 9 shows 16 synthetic graphs and their corresponding mapper on graphs outputs . For each example, certain structures are emphasized depending on the choice of the lens and cover, such as symmetry (e.g., (b), (c), (k)) and the overall shape of the data (e.g., (a), (f), (l)). The original graphs for (a) and (f) have a circular shape, so we choose the Fiedler’s vector as the lens as it has variability across the graph. for (f) is particularly interesting as the original graph comes from a torus mesh, and appears similar to its corresponding Reeb graph. also captures the dual structures of some graphs in the cases of Grid graph (j) and the Dorogovtsev-Goltsev-Mendes graph (c).

USAIR 97. The previous example illustrates a natural interpretation of the nodes in as clusters in . In Figure 10, we provide a natural interpretation of the edges in as connections between clusters. The USAIR 97 graph consists of nodes and edges [8]. The nodes represent airports and the edges routes between airports. We use as the lens and explore the USAIR 97 dataset by utilizing interactive edge selection as described in Section 4.

First in Figure 10(a), selecting edge in allows us to inspect their corresponding clusters (light and dark blue nodes) and (dark blue and purple nodes) in . and have Bethel and Anchorage International as major airports, respectively. Edge corresponds to airports in (blue nodes), including the Aniak and the St Mary’s. clearly captures the fact that these airports are the hubs between Bethel and Anchorage international in .

Second, selecting edge in enables the exploration of and in , respectively. , as shown in Figure 10(b), has Aniak and St Mary’s; while contains the Juneau International Airport. Edge in is mainly represented by Anchorage International. captures the fact that Anchorage International serves as a hub—in order to go from any airport in to airports in one must travel through Anchorage.

Finally, via node selection, node in corresponds to a peripheral cluster on the outskirt of (see Figure 10(a)). is represented mainly by the Guam international airport. In order to pass from any airport in to airports in , one must pass from the Honolulu International, which is contained in edge in .

Figure 11: (a) Map of science graph where nodes are colored by scientific disciplines. (b) Mapper on a graph (top, , ) with as the lens, in comparison with the original graph (bottom). (c) Applying interactive cover manipulation on achieves better clustering quality and shape summary.

Map of science. The map of science graph [11] (see Figure 11(a)) consists of  nodes and  edges. Nodes represent and are colored by specialties within major scientific disciplines, and edges represent co-authorship of publications between those specialties. Since does not exhibit obvious symmetry, we choose the 3rd smallest eigenfunctions of the graph Laplacian, , as the lens to help to retain the shape of . As illustrated in Figure 11(b), both and are laid out by way of correspondence where is shown to preserve the overall structure of . The highlighted nodes in also capture certain clusters in .

We could utilize interactive cover manipulation to obtain an even better representation of the data in Figure 11(c) where nodes in are circled to highlight the majority scientific discipline from the underlying cluster. For instance, node in represents the humanities-labeled nodes in ; nodes and represent chemistry and biology, respectively. Node and merge at node in , which represents medical science and infectious diseases.

6 Scalable Computation

The running time of mapper on graphs algorithm relies heavily on the choice of a lens. While the lenses we discussed earlier (Section 3) are effective in capturing various structures of an underlying graph, they are expensive to compute for very large graphs. To address the scalability issue, we need a lens that can be computed efficiently for very large graphs while still carrying structural information. We, therefore, consider PageRank [14] as an additional, scalable lens. We consider a version of the PageRank algorithm applicable to undirected graphs [41]. A PageRank vector is defined for every node ,

where is the set of neighbors of ; is the damping factor, which is typically set at . Using the formation of , PageRank yields an iterative algorithm that can be computed efficiently in practice [14, 74]. The existence of the PageRank vector is guaranteed by the Perron–Frobenius theorem [78]. A high PageRank score at typically means that is connected to many nodes, which also have high PageRank scores. The PageRank has been shown to be a continuous function in  [81]. For example, Figure 12(a) bottom illustrates the continuous variation of on for a random geometric graph. For the lens, we utilized , as it provided a good distribution of function values.

We utilize the PageRank implementation in NetworkX [42] and ran mapper on graphs on two synthetic graphs generated using NetworkX [42] and five real-world large graphs obtained from Stanford Large Network Dataset Collection [62] with up to 3 million edges using a MacBook Pro with GHz Quad-Core Intel Core i5 with GB memory. We report the average computational time in terms of PageRank () and mapper on graphs () in Table 1.

Graph Figure —V— —E— (s) (s)
Amazon0302 [60] 12(g)
ca-CondMat [61] 12(e)
com-amazon.ungraph [95] 12(c)
com-youtube.ungraph [95] 12(d)
soc-Epinions1 [83] 12(f)
Table 1: Average computational time for mapper on graphs with PageRank on five large real-world graphs. Time is reported in seconds.

Finally, we would like to demonstrate that not only our approach is scalable using the PageRank lens, it also produces meaningful visualization results. Figure 12 gives examples of applying mapper on graphs to seven large graphs using the PageRank lens. Figure 12(a-b) are both synthetic graphs: (a) is a random geometric graph [75] with a radius (, ), while (b) contains a balanced tree with branching factor and height (, ). Mapper on graphs for both graphs are shown to capture the global organizational principle of the original graph.

Figure 12(c) contains a co-purchasing graph based on the CWBTIAB feature (Customers Who Bought This Item Also Bought) on Amazon website: if a product is frequently purchased together with product , then the original graph contains an edge . While the node-link diagram of cannot be improved beyond a hairball, provides a very compact summary containing information regarding popular products serving as “hubs”. Similarly, Figure 12(d-g) show the mapper on graphs for (d) YouTube social network, (e) Condense Matter collaboration network, (f) Epinions social network, and (g) a 2nd Amazon product co-purchasing network on March 2nd, 2003. The original graphs for these figures are not shown because they, too, are essentially hairballs.

Figure 12: Mapper on graphs using the PageRank lens. In examples (a-c), mapper on a graph is shown on the top with its cover, and the original graph is shown on the bottom. For examples (d-g), the original graphs are not shown because it renders essentially as a hairball, similar to the graph shown in (c). (a) A random geometric graph: , . (b) A balanced tree: , . (c) Amazon product co-purchasing network: , . (d) YouTube social network: , . (e) Condense Matter collaboration network: , . (f) Epinions social network: , . (g) A 2nd Amazon product co-purchasing network on March 2nd, 2003: , .

7 Discussion

In this paper, we present a TDA approach for graph visualization using a variant of the mapper construction called mapper on graphs. Our approach is effective at detecting clusters in a graph and preserving the relations among these clusters. It is also flexible in capturing the structure of a graph across multiple scales based on different topological or geometric lenses. Mapper on graphs could potentially be applied to other visualization tasks, for instance, as a skeleton representation for the underlying graph in determining an initial layout in graph drawing. Traditionally, the PageRank vector is employed on the web graph, which could be considered as a temporal network—the PageRank vector at a previous instance of the web graph is used as an initial vector to obtain a fast PageRank solution for the current instance of the web graph. Therefore, Mapper on graphs in conjugation with the PageRank lens could be utilized for the study of temporal networks.

Our focus in this paper is applying mapper on graphs to graph visualization. We are also interested in the theoretical properties of mapper on graphs. For example, how do we measure the distance between a mapper on graphs and its underlying graph ? What is an appropriate metric under which converges to the Reeb graph of as the granularity of the cover goes to zero? What is the structural stability of mapper on graphs with respect to perturbations of the lens , graph , and cover ? Recent theoretical advances on the stability [15, 19] and convergence [3, 15, 18, 19, 25, 68] of mapper construction could address these questions, at least partially. However, questions remain that are unique to mapper on graphs: e.g., what is the relation between the shape of and the geometric and topological properties of the lens ?


This work was supported in part by the National Science Foundation NSF IIS-1513616 and NSF DBI-1661375.


  1. KLA corporation. E-mail:
  2. University of South Florida. E-mail:
  3. University of Utah. E-mail: Corresponding author.


  1. Muthu Alagappan. From 5 to 13: Redefining the positions in basketball. MIT Sloan Sports Analytics Conference, 2012.
  2. Daniel Archambault, Tamara Munzner, and David Auber. Topolayout: Multilevel graph layout by topological features. IEEE transactions on visualization and computer graphics, 13(2), 2007.
  3. Aravindakshan Babu. Zigzag Coarsenings, Mapper Stability and Gene-network Analyses. PhD thesis, Stanford University, 2013.
  4. Benjamin Bach, Nathalie Henry Riche, Christophe Hurter, Kim Marriott, and Tim Dwyer. Towards unambiguous edge bundling: Investigating confluent drawings for network visualization. IEEE transactions on visualization and computer graphics, 23(1):541–550, 2017.
  5. Maria Bampasidou and Thanos Gentimis. Modeling collaborations with persistent homology. CoRR, abs/1403.5346, 2014.
  6. Josh Barnes and Piet Hut. A hierarchical force-calculation algorithm. Nature, 324(6096):446, 1986.
  7. Mathieu Bastian, Sebastien Heymann, and Mathieu Jacomy. Gephi: an open source software for exploring and manipulating networks. In ICWSM, pages 361–362, 2009.
  8. Vladimir Batagelj and Andrej Mrvar. Pajek datasets., 2006.
  9. S. Biasotti, D. Giorgi, M. Spagnuolo, and B. Falcidieno. Reeb graphs for shape analysis and applications. Theoretical Computer Science, 392:5–22, 2008.
  10. S. Biasotti, S. Marini, M. Mortara, and G. Patane. An overview on properties and efficacy of topological skeletons in shape modelling. Shape Modeling International, 2003.
  11. Katy Börner, Richard Klavans, Michael Patek, Angela M Zoss, Joseph R Biberstine, Robert P Light, Vincent Larivière, and Kevin W Boyack. Design and update of a classification system: The UCSD map of science. PloS one, 7(7):e39464, 2012.
  12. Ulrik Brandes, Marco Gaertler, and Dorothea Wagner. Experiments on graph clustering algorithms. In European Symposium on Algorithms, pages 568–579, 2003.
  13. Ulrik Brandes and Christian Pich. Eigensolver methods for progressive multidimensional scaling of large data. In Graph Drawing, pages 42–53. Springer, 2007.
  14. Sergey Brin and Lawrence Page. The anatomy of a large-scale hypertextual web search engine. Computer networks and ISDN systems, 30(1-7):107–117, 1998.
  15. Adam Brown, Omer Bobrowski, Elizabeth Munch, and Bei Wang. Probabilistic convergence and stability of random mapper graphs. arXiv preprint arXiv:1909.03488, 2019.
  16. Thang Nguyen Bui, Soma Chaudhuri, Frank Thomson Leighton, and Michael Sipser. Graph bisection algorithms with good average case behavior. Combinatorica, 7(2):171–191, 1987.
  17. Gunnar Carlsson. Topological pattern recognition for point cloud data. Acta Numerica, 23:289–368, 2014.
  18. Mathieu Carriére, Bertrand Michel, and Steve Oudot. Statistical analysis and parameter selection for mapper. Journal of Machine Learning Research, 19:1–39, 2018.
  19. Mathieu Carriére and Steve Oudot. Structure and stability of the one-dimensional mapper. Foundations of Computational Mathematics, 18(6):1333–1396, 2018.
  20. C. J. Carstens and K. J. Horadam. Persistent homology of collaboration networks. Mathematical Problems in Engineering, 2013, 2013.
  21. Ben Cassidy, Caroline Rae, and Victor Solo. Brain activity: Conditional dissimilarity and persistent homology. IEEE 12th International Symposium on Biomedical Imaging (ISBI), pages 1356 – 1359, 2015.
  22. Moo K Chung, Seongho Seo, Nagesh Adluru, and Houri K Vorperian. Hot spots conjecture and its application to modeling tubular structures. In International Workshop on Machine Learning in Medical Imaging, pages 225–232, 2011.
  23. Weiwei Cui, Hong Zhou, Huamin Qu, Pak Chung Wong, and Xiaoming Li. Geometry-based edge clustering for graph visualization. IEEE Transactions on Visualization and Computer Graphics, 14(6):1277–1284, 2008.
  24. Y. Dabaghian, F. Mémoli, L. Frank, and G. Carlsson. A topological paradigm for hippocampal spatial map formation using persistent homology. PLoS Computational Biology, 8(8):e1002581, 2012.
  25. Tamal K Dey, Facundo Memoli, and Yusu Wang. Topological analysis of nerves, reeb spaces, mappers, and multiscale mappers. arXiv preprint arXiv:1703.07387, 2017.
  26. Inderjit S Dhillon. Co-clustering documents and words using bipartite spectral graph partitioning. In Proceedings of the 7th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 269–274, 2001.
  27. Chris H. Q. Ding, Xiaofeng He, Hongyuan Zha, Ming Gu, and Horst D. Simon. A min-max cut algorithm for graph partitioning and data clustering. IEEE International Conference on Data Mining, pages 107–114, 2001.
  28. Kasper Dinkla, Michel A Westenberg, and Jarke J van Wijk. Compressed adjacency matrices: untangling gene regulatory networks. IEEE Transactions on Visualization and Computer Graphics, 18(12):2457–2466, 2012.
  29. Irene Donato, Giovanni Petri, Martina Scolamiero, Lamberto Rondoni, and Francesco Vaccarino. Decimation of fast states and weak nodes: topological variation via persistent homology. Proceedings of the European Conference on Complex Systems, pages 295–301, 2012.
  30. Cody Dunne and Ben Shneiderman. Motif simplification. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 2013.
  31. Tim Dwyer, Nathalie Henry Riche, Kim Marriott, and Christopher Mears. Edge compression techniques for visualization of dense directed graphs. IEEE Transactions on Visualization and Computer Graphics, 19(12):2596–2605, 2013.
  32. Weinan E, Jianfeng Lu, and Yuan Yao. The landscape of complex networks. CoRR, abs/1204.6376, 2012.
  33. Geoffrey Ellis and Alan Dix. A taxonomy of clutter reduction for information visualisation. IEEE Transactions on Visualization and Computer Graphics, 13(6):1216–1223, 2007.
  34. John Ellson, Emden Gansner, Lefteris Koutsofios, Stephen C North, and Gordon Woodhull. Graphviz— open source graph drawing tools. Graph Drawing, pages 483–484, 2002.
  35. Ozan Ersoy, Christophe Hurter, Fernando Paulovich, Gabriel Cantareiro, and Alex Telea. Skeleton-based edge bundling for graph visualization. IEEE Transactions on Visualization and Computer Graphics, 17(12):2364–2373, 2011.
  36. Miroslav Fiedler. Algebraic connectivity of graphs. Czechoslovak mathematical journal, 23(2):298–305, 1973.
  37. Thomas M. J. Fruchterman and Edward M. Reingold. Graph drawing by force-directed placement. Software: Practice and experience, 21(11):1129–1164, 1991.
  38. Emden R Gansner, Yifan Hu, Stephen North, and Carlos Scheidegger. Multilevel agglomerative edge bundling for visualizing large graphs. In IEEE Pacific Visualization Symposium, pages 187–194, 2011.
  39. Emden R Gansner, Yehuda Koren, and Stephen North. Graph drawing by stress majorization. In Graph Drawing, pages 239–250. Springer, 2005.
  40. Emden R. Gansner, Eleftherios Koutsofios, Stephen C. North, and Kiem-Phong Vo. A technique for drawing directed graphs. IEEE Transactions on Software Engineering, 19(3):214–230, 1993.
  41. Vince Grolmusz. A note on the pagerank of undirected graphs. arXiv preprint arXiv:1205.1960, 2012.
  42. Aric A. Hagberg, Daniel A. Schult, and Pieter J. Swart. Exploring network structure, dynamics, and function using NetworkX. Proceedings of the 7th Python in Science Conference, 2008.
  43. Mustafa Hajij, Bei Wang, Carlos Scheidegger, and Paul Rosen. Visual detection of structural changes in time-varying graphs using persistent homology. IEEE Pacific Visualization Symposium (PacificVis), 2018.
  44. Derek Hansen, Ben Shneiderman, and Marc A Smith. Analyzing social media networks with NodeXL: Insights from a connected world. Morgan Kaufmann, 2010.
  45. Danny Holten and Jarke J Van Wijk. Force-directed edge bundling for graph visualization. Computer Graphics Forum, 28(3):983–990, 2009.
  46. Danijela Horak, Slobodan Maletić, and Milan Rajković. Persistent homology of complex networks. Journal of Statistical Mechanics: Theory and Experiment, page P03034, 2009.
  47. Yifan Hu. Efficient, high-quality force-directed graph drawing. Mathematica Journal, 10(1):37–71, 2005.
  48. Glen Jeh and Jennifer Widom. SimRank: a measure of structural-context similarity. In Proceedings of the 8th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 538–543, 2002.
  49. Sanjay Kairam, Diana MacLean, Manolis Savva, and Jeffrey Heer. Graphprism: Compact visualization of network structure. In Advanced Visual Interfaces, 2012.
  50. Marc Khoury, Yifan Hu, Shankar Krishnan, and Carlos Scheidegger. Drawing large graphs by low-rank stress majorization. Computer Graphics Forum, 31:975–984, 2012.
  51. Vladimir G Kim, Yaron Lipman, Xiaobai Chen, and Thomas Funkhouser. Möbius transformations for global intrinsic symmetry analysis. Computer Graphics Forum, 29(5):1689–1700, 2010.
  52. Yehuda Koren. On spectral graph drawing. In Computing and Combinatorics, pages 496–508. Springer, 2003.
  53. Yehuda Koren, Liran Carmel, and David Harel. Ace: A fast multiscale eigenvectors computation for drawing huge graphs. IEEE Symposium on Information Visualization, pages 137–144, 2002.
  54. Brian Kulis, Sugato Basu, Inderjit Dhillon, and Raymond Mooney. Semi-supervised graph clustering: a kernel approach. Machine learning, 74(1):1–22, 2009.
  55. Stephane Lafon and Ann B Lee. Diffusion maps and coarse-graining: A unified framework for dimensionality reduction, graph partitioning, and data set parameterization. IEEE transactions on pattern analysis and machine intelligence, 28(9):1393–1403, 2006.
  56. Hyekyoung Lee, Moo K. Chung, Hyejin Kang, Boong-Nyun Kim, and Dong Soo Lee. Computing the shape of brain networks using graph filtration and gromov-hausdorff metric. International Conference on Medical Image Computing and Computer Assisted Intervention, pages 302–309, 2011.
  57. Hyekyoung Lee, Moo K. Chung, Hyejin Kang, Bung-Nyun Kim, and Dong Soo Lee. Discriminative persistent homology of brain networks. IEEE International Symposium on Biomedical Imaging: From Nano to Macro, pages 841–844, 2011.
  58. Hyekyoung Lee, Hyejin Kang, Moo K. Chung, Bung-Nyun Kim, and Dong Soo Lee. Persistent brain network homology from the perspective of dendrogram. IEEE Transactions on Medical Imaging, 31(12):2267–2277, 2012.
  59. Hyekyoung Lee, Hyejin Kang, Moo K. Chung, Bung-Nyun Kim, and Dong Soo Lee. Weighted functional brain network modeling via network filtration. NIPS Workshop on Algebraic Topology and Machine Learning, 2012.
  60. Jure Leskovec, Lada A Adamic, and Bernardo A Huberman. The dynamics of viral marketing. ACM Transactions on the Web (TWEB), 1(1), 2007.
  61. Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. Graph evolution: Densification and shrinking diameters. ACM Transactions on Knowledge Discovery from Data (TKDD), 1(1):2, 2007.
  62. Jure Leskovec and Andrej Krevl. SNAP Datasets: Stanford large network dataset collection., June 2014.
  63. Bruno Lévy. Laplace-beltrami eigenfunctions towards an algorithm that understands geometry. In IEEE International Conference on Shape Modeling and Applications, page 13, 2006.
  64. Shusen Liu, Dan Maljovec, Bei Wang, Peer-Timo Bremer, and Valerio Pascucci. Visualizing high-dimensional data: Advances in the past decade. IEEE Transactions on Visualization and Computer Graphics, 23(3):1249–1268, 2017.
  65. P. Y. Lum, G. Singh, A. Lehman, T. Ishkanov, M. Vejdemo-Johansson, M. Alagappan, J. Carlsson, and G. Carlsson. Extracting insights from the shape of complex data using topology. Scientific Reports, 3, 2013.
  66. Ulrike Von Luxburg. A tutorial on spectral clustering. Statistics and computing, 17(4):395–416, 2007.
  67. Fintan McGee and John Dingliana. An empirical study on the impact of edge bundling on user comprehension of graphs. In Proceedings of the International Working Conference on Advanced Visual Interfaces, pages 620–627. ACM, 2012.
  68. Elizabeth Munch and Bei Wang. Convergence between categorical representations of Reeb space and mapper. In Sándor Fekete and Anna Lubiw, editors, Proceedings of the 32nd International Symposium on Computational Geometry, volume 51 of Leibniz International Proceedings in Informatics (LIPIcs), pages 53:1–53:16, Dagstuhl, Germany, 2016. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik.
  69. Mark E. J. Newman. Fast algorithm for detecting community structure in networks. Physical review E, 69(6):066133, 2004.
  70. Mark EJ Newman. Properties of highly clustered networks. Physical Review E, 68(2):026121, 2003.
  71. Andrew Y Ng, Michael I Jordan, and Yair Weiss. On spectral clustering: Analysis and an algorithm. In Advances in neural information processing systems, pages 849–856, 2002.
  72. Monica Nicolau, Arnold J. Levine, and Gunnar Carlsson. Topology based data analysis identifies a subgroup of breast cancers with a unique mutational profile and excellent survival. Proceedings of the National Academy of Sciences, 108(17):7265–7270, 2011.
  73. Monica Nicolaua, Arnold J. Levine, and Gunnar Carlsson. Topology based data analysis identifies a subgroup of breast cancers with a unique mutational profile and excellent survival. Proceedings National Academy of Sciences of the United States of America, 108(17):7265–7270, 2011.
  74. Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford InfoLab, 1999.
  75. Mathew Penrose. Random geometric graphs. Oxford University Press, 2003.
  76. Giovanni Petri, Martina Scolamiero, Irene Donato, and Francesco Vaccarino. Networks and cycles: A persistent homology approach to complex networks. Proceedings European Conference on Complex Systems 2012, Springer Proceedings in Complexity, pages 93–99, 2013.
  77. Giovanni Petri, Martina Scolamiero, Irene Donato, and Francesco Vaccarino. Topological strata of weighted complex networks. PLoS ONE, 8(6):e66506, 2013.
  78. S. Unnikrishna Pillai, Torsten Suel, and Seunghun Cha. The Perron-Frobenius theorem: some of its applications. IEEE Signal Processing Magazine, 22(2):62–75, 2005.
  79. Virginia Pirino, Eva Riccomagno, Sergio Martinoia, and Paolo Massobrio. A topological study of repetitive co-activation networks in in vitro cortical assemblies. Physical Biology, 12(1), 2015.
  80. Pascal Pons and Matthieu Latapy. Computing communities in large networks using random walks. In International symposium on computer and information sciences, pages 284–293, 2005.
  81. Luca Pretto. Analysis of web link analysis algorithms: The mathematics of ranking. In Maristella Agosti, editor, Information Access through Search Engines and Digital Libraries, pages 97–111. Springer, 2008.
  82. Martin Reuter, Silvia Biasotti, Daniela Giorgi, Giuseppe Patané, and Michela Spagnuolo. Discrete laplace–beltrami operators for shape analysis and segmentation. Computers & Graphics, 33(3):381–390, 2009.
  83. Matthew Richardson, Rakesh Agrawal, and Pedro Domingos. Trust management for the semantic web. In International semantic Web conference, pages 351–368. Springer, 2003.
  84. Satu Elisa Schaeffer. Graph clustering. Computer science review, 1(1):27–64, 2007.
  85. David Selassie, Brandon Heller, and Jeffrey Heer. Divided edge bundling for directional network data. IEEE Transactions on Visualization and Computer Graphics, 17(12):2354–2363, 2011.
  86. David I Shuman, Sunil K Narang, Pascal Frossard, Antonio Ortega, and Pierre Vandergheynst. The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains. IEEE Signal Processing Magazine, 30(3):83–98, 2013.
  87. Bernard W Silverman. Density estimation for statistics and data analysis, volume 26. CRC press, 1986.
  88. Gurjeet Singh, Facundo Mémoli, and Gunnar Carlsson. Topological methods for the analysis of high dimensional data sets and 3d object recognition. Eurographics Symposium on Point-Based Graphics, 22, 2007.
  89. Ashley Suh, Mustafa Hajij, Bei Wang, Carlos Scheidegger, and Paul Rosen. Persistent homology guided force-directed graph layouts. Proceedings of IEEE Visualization Conference (InfoVis), 2019.
  90. Yuanyuan Tian, Richard A Hankins, and Jignesh M Patel. Efficient aggregation for graph summarization. In Proceedings of the ACM SIGMOD international conference on Management of data, pages 567–580, 2008.
  91. Brenda Y Torres, Jose Henrique M Oliveira, Ann Thomas Tate, Poonam Rath, Katherine Cumnock, and David S Schneider. Tracking resilience to infections by mapping disease space. PLoS biology, 14(4), 2016.
  92. W. T. Tutte. How to draw a graph. Proceedings of the London Mathematical Society, s3-13(1):743–767, Jan 1963. URL:, doi:10.1112/plms/s3-13.1.743.
  93. Tatiana Von Landesberger, Arjan Kuijper, Tobias Schreck, Jörn Kohlhammer, Jarke J van Wijk, J-D Fekete, and Dieter W Fellner. Visual analysis of large graphs: State-of-the-art and future research challenges. Computer Graphics Forum, 30(6):1719–1749, 2011.
  94. Scott White and Padhraic Smyth. A spectral clustering approach to finding communities in graphs. In Proceedings of the SIAM international conference on data mining, pages 274–285, 2005.
  95. Jaewon Yang and Jure Leskovec. Defining and evaluating network communities based on ground-truth. Knowledge and Information Systems, 42(1):181–213, 2015.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description