Persistent homology of unweighted complex networks via discrete Morse theory

Persistent homology of unweighted complex networks via discrete Morse theory

Harish Kannan The Institute of Mathematical Sciences (IMSc), Homi Bhabha National Institute (HBNI), Chennai 600113 India    Emil Saucan Department of Applied Mathematics, ORT Braude College, Karmiel 2161002 Israel Department of Electrical Engineering, Technion, Israel Institute of Technology, Haifa 3200003 Israel    Indrava Roy Correspondence to: indrava@imsc.res.in The Institute of Mathematical Sciences (IMSc), Homi Bhabha National Institute (HBNI), Chennai 600113 India    Areejit Samal Correspondence to: asamal@imsc.res.in The Institute of Mathematical Sciences (IMSc), Homi Bhabha National Institute (HBNI), Chennai 600113 India Max Planck Institute for Mathematics in the Sciences, Leipzig 04103 Germany
Abstract

We present a new method based on discrete Morse theory to study topological properties of unweighted and undirected networks using persistent homology. Leveraging on the features of discrete Morse theory, our method produces a discrete Morse function that assigns weights to vertices, edges, triangles and higher-dimensional simplices in the clique complex of a graph in a concordant fashion. Importantly, our method not only captures the topology of the clique complex of such graphs via the concept of critical simplices, but also achieves close to the theoretical minimum number of critical simplices in several analyzed model and real networks. This leads to a reduced filtration scheme based on the subsequence of the corresponding critical weights. We have employed our new method to explore persistent homology of several unweighted model and real-world networks. We show that the persistence diagrams from our method can distinguish between the topology of different types of model and real networks. In summary, our method based on discrete Morse theory further increases the applicability of persistent homology to investigate global topology of complex networks.

Introduction

In recent years, the field of topological data analysis (TDA) has rapidly grown to provide a set of powerful tools to analyze various important features of data Carlsson (2009). In this context, persistent homology has played a key role in bringing TDA to the fore of modern data analysis. It not only gives a way to visualize data efficiently, but also to extract relevant information from both structured and unstructured datasets. This crucial aspect has been used effectively in various applications from astrophysics (e.g., determination of inter-galactic filament structures) Pranav et al. (2016) to imaging analysis (e.g., feature detection in 3D gray-scale images) Günther et al. (2011) to biology (e.g., detection of breast cancer type with high survival rates) Nicolau et al. (2011). Informally, the essence of the theory is its power to extract the shape of data, as well as infer high-order correlations between various parts of the data at hand which are missed by other classical techniques. The basic mathematical theory used in this subject is that of algebraic topology, and in particular the study of homology, developed by the French mathematician Henri Poincaré at the turn of the 20th century. The origins of persistent homology lie in the ideas of Morse theory Morse (1934), which gives a powerful tool to detect the topological features of a given space through the computation of homology using real-valued functions on the space. We refer the reader to the survey article Edelsbrunner and Harer (2008) for further details.

On the other hand, the discretized version of Morse theory developed by Robin Forman Forman (1995, 2002), gives a way to characterize the homology group of a simplicial complex in terms of a real-valued function with certain properties, known as a discrete Morse function. Examples of such simplicial complexes associated with discrete spaces are the Vietoris-Rips complex corresponding to a discrete metric space, or the clique complex of a graph. Forman showed Forman (2002) that given such a function, the so-called critical simplices completely determine the Euler characteristic of the space, which is a fundamental topological invariant.

The study of complex networks in the last few decades has also significantly raised our ability to understand various kinds of interactions arising in both natural and artificial realms Watts and Strogatz (1998); Barabási and Albert (1999); Albert and Barabási (2002); Newman (2010). Understanding how different parts of networks behave and influence each other is therefore an important problem Watts and Strogatz (1998); Barabási and Albert (1999); Albert and Barabási (2002); Newman (2010). However, for large networks, detecting higher order structures remains a difficult task Bianconi (2015). While a graph representation captures binary relationships among vertices of a network, simplicial complexes also reflect higher-order relationships in a complex network De Silva and Ghrist (2007); Horak et al. (2009); Petri et al. (2013, 2014); Wu et al. (2015); Sizemore et al. (2016); Courtney and Bianconi (2017, 2018). In this context, persistent homology has been employed to explore the topological properties of complex networks De Silva and Ghrist (2007); Horak et al. (2009); Lee et al. (2012); Petri et al. (2013, 2014); Sizemore et al. (2016). In this work, we present a systematic method to study the persistent homology of unweighted and undirected graphs or networks. Previous work has investigated the persistent homology of weighted and undirected networks by creating a filtration of the clique complexes corresponding to threshold graphs obtained via decreasing sequence of edge weights Petri et al. (2013); Sizemore et al. (2016). However, the lack of edge weights in unweighted networks does not permit a filtration based on threshold graphs Petri et al. (2013); Sizemore et al. (2016). Therefore, Horak et al. Horak et al. (2009) study the persistent homology of unweighted and undirected networks based on a dimensional filtration scheme which adds at each filtration step the -skeleton of a simplicial complex. Another resolution would be to transform an unweighted network into a weighted network by assigning edge weights based on some network property, such as edge betweenness centrality Freeman (1977); Girvan and Newman (2002) or discrete edge curvature Sreejith et al. (2016); Samal et al. (2018), and then employing the filtration scheme based on threshold graphs Petri et al. (2013); Sizemore et al. (2016).

In the context of TDA, discrete Morse theory Forman (1995, 2002) provides an efficient way of capturing the persistent homology of unweighted simple graphs. This is done by using the values given by the discrete Morse function to pass from an unweighted graph to a weighted simplicial complex (Figure 1). This transformation automatically produces a filtration that is needed for the computation of persistent homology, through the so-called level subcomplexes associated with critical simplices (See Theory section and Figure 2). Moreover, this filtration is consistent with the topology of the underlying space and reveals finer topological features than the dimensional filtration scheme used in Horak et al. (2009). The combination of these techniques have been used by Gunther et al. Günther et al. (2011) with applications for image processing of 3D-grayscale images. However, to the best of our knowledge, this method has not been used for studying persistent homology in unweighted complex networks to date. Discrete Morse theory gives a theoretical lower bound on the number of critical simplices or filtration steps which can be attained by an optimal choice of the function on a simplicial complex. Interestingly, our method achieves close to the theoretical minimum number of critical simplices or filtration steps in several model and real networks analyzed here. Furthermore, our algorithm for computing the discrete Morse function is easy to implement for large complex networks.

Our results underline the potence of persistent homology to detect inherent topological features of networks which are not directly captured by homology alone. For instance, the -Betti numbers of the clique complexes corresponding to small-world Watts and Strogatz (1998) and scale-free Barabási and Albert (1999) networks with similar size and average degree, respectively, are of comparable magnitude and thus, homology reveals no deep insight into the differences between the topological features of these two model networks. On the other hand, our observations on the persistent homology of these two networks indicate a clear demarkation with respect to the evolution of topological features in the clique complexes corresponding to these model networks during the filtration process. This dissimilarity in the evolution of topological characteristics that resonates across dimensions and the average degree of the underlying network, indicates an inherent disparity in the persistent homology of small-world and scale-free graphs. The ability to capture inherent topological differences between two dissimilar networks thus motivates the application of our methods to study the persistent homology of real-world networks.

The remainder of the paper is organized as follows. We begin with a Theory section which gives a brief overview of concepts in homology, persistent homology and discrete Morse theory. We then proceed to describe the model networks and real-world networks that have been studied in this work in the Network datasets section. In the subsequent section on Results and Discussion, we present our algorithm to construct a discrete Morse function on a simplical complex associated with a network and provide a rigorous proof of concept for the algorithm. We then follow up with two algorithms both of which illustrate key procedures that are essential to construct the filtration of a simplical complex associated with the networks under study. In the same section, we present our results for model networks and real-world networks. The final section on Conclusions gives a summary and outlook of our findings.

Figure 1: An illustration of the construction of a discrete Morse function on a clique complex corresponding to an unweighted and undirected graph using our algorithm. (a) A simple example of an unweighted and undirected graph containing 9 vertices and 11 edges. (b) The clique simplicial complex corresponding to the simple graph shown in (a). The clique complex consists of 8 vertices or -simplices, 11 edges or -simplices and 2 triangles or -simplices. The figure also displays the orientation of the - and -simplices using arrows. (c) Generation of a discrete Morse function on the clique complex shown in part (b) using our algorithm. The figure lists the state of the Flag variable in algorithm 1 and IsCritical variable in algorithm 2 for each simplex in . In this example, the clique complex has 4 critical simplices and their corresponding critical weights are the filtration steps. The figure also lists the FiltrationWeight for each simplex in obtained from algorithm 3.
Figure 2: Filtration based on the entire sequence of weights satisfying discrete Morse function is equivalent to filtration based only on the subsequence of critical weights in terms of persistent homology. (a) Filtration of the network shown in Figure 1 based on weights of the 4 critical simplices. There is a -hole (or connected component) that persists across the 4 stages of the filtration while another 0-hole is born at stage 2 on addition of critical vertex but dies at the stage 3 on addition of the critical edge . Moreover, a -hole is born at the stage 4 on addition of the critical edge . (b) Five intermediate stages during the filtration between critical weights 1.1 (stage 2) and 2.35 (stage 3). (c) Four intermediate stages during the filtration between critical weights 2.35 (stage 3) and 3.48 (stage 4). It is seen that the homology of the clique complex remains unchanged during the intermediate stages of the filtration whereby the birth and death of holes occur only on addition of critical simplices.

Theory

Graphs and Simplicial Complexes

Consider a finite simple graph having vertex set and the edge set . Note that a simple graph does not contain self-loops or multi-edges Bollobas (1998). Such a simple graph can be viewed as a clique complex Zomorodian and Carlsson (2005). A clique simplicial complex is a collection of simplices where a -dimensional simplex (or -simplex) in is a set of vertices that form a complete subgraph. Note that the dimension of simplices contained in is restricted to the range 0 to the number of vertices in the graph . The dimension of the clique complex is given by the maximum dimension of its constituent simplices. A face of a -simplex is a subset of such that is itself an -simplex with dimension , and this relationship is denoted as . In other words, vertices correspond to -simplices, edges to -simplices, and triangles to -simplices in the clique complex of a graph. Formally, the clique complex corresponding to the simple graph satisfies the following condition which defines an abstract simplicial complex, namely, is a collection of non-empty finite sets or simplices such that if is an element (simplex) of then so is every non-empty subset of . For additional details, the interested reader is referred to standard text in algebraic topology Munkres (2018). Figure 1 displays an example of the correspondence between a simple graph and its clique complex.

Homology of a simplicial complex

The ordering of the vertex set of a -simplex determines its orientation. Moreover, two orderings of the vertex set of are considered to be equivalent if and only if they differ by an even permutation. If the dimension of a -simplex is greater than 1, then all possible orderings of its vertex set fall under two equivalence classes, with each class being assigned an orientation Munkres (2018). An exception is the -simplex with one vertex which has exactly one equivalence class and orientation. An oriented -simplex also specifies the orientation of its vertices and is represented by Munkres (2018). In figure 1, the oriented -simplices and have opposite orientations, i.e., .

We next describe a mathematical group which provides the machinery to represent paths in a simplicial complex. The -chain group of a simplicial complex is the Abelian group generated by the oriented -simplices in with coefficients in a field . Elements of the -chain group are referred to as -chains and have the form where we use the same notation to represent both the oriented -simplex and its corresponding generator in and are scalars from the field . Note that the identity element in is the unique -chain for which all the coefficients are zero in . Note also that if two -simplices, and , have the same vertex set but opposite orientations, then the generators of corresponding to the two simplices are inverse of each other (i.e., ). In figure 1, the cycle of length 4 formed by the edges or 1-simplices, , , and , can be represented as an element of the -chain group which is .

The boundary operator on the generator corresponding to an oriented -simplex is defined as follows Munkres (2018):

(1)

where refers to the absence of in the -simplex. By linear extension, the boundary operator on any element of -chain group gives:

(2)

Note that the boundary operator in the above equation maps a -chain to a -chain. In figure 1, the boundary operator applied to the -chain representing an edge gives while the boundary operator applied to the -chain representing a cycle of length 4 gives 0.

This motivates the definition of -cycles and -boundaries . The -cycles are the elements of the -chain group which are mapped to 0 by the boundary operator , and thus:

(3)

The -boundaries is defined as follows:

(4)

Thus, -boundaries are the elements of the -chain group which also happen to be the boundary of an element in the -chain group . Note that both and are subgroups of the -chain group . It can be shown that the composition which implies that is a subgroup of Munkres (2018). Simply stated, it is true that the -boundary operator applied on a -boundary gives 0. Hence, it follows that every -boundary is a -cycle but not necessarily vice versa.

The -homology group is defined as Munkres (2018):

(5)

where is the quotient group Dummit and Foote (2003) of over . We informally refer to the elements of -homology group as -holes. To provide an intuition for the definition of homology groups, a natural way to describe a -hole would be to characterize it as a -cycle which is not a -boundary. In figure 1, the -cycle is a -hole as it not a -boundary of any -chain while the -cycle is not a -hole as it is the -boundary of the -simplex . Thus, the concept of quotient groups provide the mathematical machinery to characterize such -holes in simplicial complexes.

The -Betti number is defined as the dimension of the homology group viewed as a vector space over field Munkres (2018). Informally, the -Betti number represents the number of -holes of the simplicial complex. We remark that the Euler characteristic of the clique complex with dimension corresponding to a graph is given by the alternating sum of Betti numbers Munkres (2018), namely,

(6)

In this work, we use the finite field , i.e., the field with two elements.

Persistent homology of a simplicial complex

A subset of a simplicial complex is called a subcomplex of if by itself is an abstract simplicial complex. Then a filtration of a simplicial complex is defined as a nested sequence of subcomplexes of where:

(7)

Note that each subcomplex has an associated index in the filtration. Moreover, each subcomplex in the filtration has corresponding -chain complexes , -boundary operators , -boundaries and -cycles .

The -persistent -homology group of denoted as is defined as:

(8)

In the above equation, is the subgroup of which constitutes the -boundaries of the subcomplex . The -persistent -Betti number of denoted as is defined as:

(9)

An intuitive explanation of the above definitions of the -persistent -homology group and the corresponding Betti number is as follows. A -hole of the subcomplex can potentially become the boundary of a -chain of a later subcomplex with , and thus, no longer constitute a -hole of . The -persistent -Betti number of represents the number of -holes at the filtration index that persist at the filtration index . Therefore, each -hole that appears across the filtration has a unique index that corresponds to its birth and death, and the persistence of such a -hole can thus be characterized by its corresponding birth and death indices. Studying persistent homology allows us to quantify the longevity of such -holes during filtration, and thus, measures the importance of these topological features which appear and disappear across the filtration.

Discrete Morse Theory

Recalling from the preceeding section, to study the persistent homology corresponding to the clique complex of a simple graph , the primary requirement is a filtration of . Previous works Petri et al. (2013); Sizemore et al. (2016) which investigate the persistent homology of complex networks limit their study to weighted and undirected networks. The filtration of the clique complex corresponding to a weighted graph is constructed by forming a nested sequence of clique complexes which correspond to the threshold graphs of obtained via decreasing sequence of the edge weights Petri et al. (2013); Sizemore et al. (2016). In the context of unweighted and undirected networks, the absence of edge weights prohibits such a filtration scheme based on threshold graphs Petri et al. (2013); Sizemore et al. (2016). A possible resolution would be to utilize network properties, such as edge betweenness centrality Freeman (1977); Girvan and Newman (2002) or discrete edge curvature Sreejith et al. (2016); Samal et al. (2018) to transform an unweighted network into a weighted network and then employing the filtration scheme based on threshold graphs. Taking an alternative route, Horak et al. Horak et al. (2009) devised a filtration scheme based on the dimension of simplices Horak et al. (2009) wherein all -simplices of the clique complex are added at the filtration step.

In this work, we present a systematic method to study the persistent homology of unweighted and undirected networks by utilizing a refined filtration of the clique complex . Our proposed scheme which is based on discrete Morse theory developed by Forman Forman (1995, 2002) assigns weights to -simplices (vertices), -simplices (edges), -simplices (triangles) and higher-dimensional simplices appearing in the clique complex corresponding to an unweighted and undirected network. Assigning weights to higher-dimensional simplices captures important higher-order correlations in addition to edges or -simplices. Moreover, we leverage the following important features of the framework of discrete Morse theory in our new scheme. Firstly, the framework enables assignment of weights to -simplices which are concordant with weights of -simplices. Secondly, it captures the topology of a simplicial complex via the concept of critical simplices described below. Thirdly, the framework provides a natural way to create a filtration scheme to study persistent homology based upon the weights of the aforementioned critical simplices as will be described below.

We next provide the fundamental definitions in discrete Morse theory Forman (2002). We remark that a -dimensional simplex of a simplicial complex is denoted by in the sequel. Given a function , for each simplex , two sets and are defined as follows:

(10)
(11)

Simply stated, the set contains any -simplex of which is a face and the function value on is less than or equal to the function value on . The set contains any -simplex which is a face of and the function value on is less than or equal to the function value on . A function is a discrete Morse function Forman (2002) if and only if for each simplex :

(12)

Given a discrete Morse function on the simplicial complex , a simplex is critical Forman (2002) if and only if:

(13)

We remark that once a discrete Morse function on a simplicial complex is fixed, the sets and are denoted by and , respectively, to simplify the notation. In the results section, we present our new scheme and algorithm 1 to assign a discrete Morse function to a clique complex of an unweighted graph .

We next describe the filtration of the clique simplicial complex based on the discrete Morse function . Given a discrete Morse function on a simplicial complex and a real number , a level subcomplex is defined Forman (2002) as follows:

(14)

Simply stated, contains all simplices in with the value of the discrete Morse function or assigned weight along with any face of . Note that a face of is included in even if the discrete Morse function or assigned weight to a face is greater than .

Let denote the entire set of values assigned to simplices in using the discrete morse function . Then, let denote the finite increasing sequence of the unique values in the set associated with the finite simplicial complex considered here. We now have a sequence of inclusions of level subcomplexes corresponding to this increasing sequence as follows:

(15)

This nested sequence gives a filtration of the simplicial complex which enables the study of persistent homology in the context of unweighted networks.

According to Lemma 2.6 by Forman Forman (2002), if there are no critical simplices with then is homotopy equivalent to .

The implications of this Lemma are as follows. Let denote the set of values assigned to critical simplices in by the discrete Morse function and denote the increasing sequence of the unique values in . We refer to the function values assigned to the critical simplices in as critical weights. Note that the set defined for critical simplices is a subset of defined for all simplices in and the increasing sequence is a subsequence of with . The above definition implies that there are no critical simplices with . As homology is invariant under homotopy equivalence, Forman’s Lemma 2.6 gives us that for any and belonging to the real number interval , the homology groups of and are isomorphic. Thus, in order to observe the changes in homology as the filtration proceeds, it suffices to study the persistent homology of a filtration which corresponds to the subsequence of , where , and this results in a potential decrease in the required number of filtration steps. The new filtration sequence can be represented as:

(16)

Note that each simplex in the clique complex is first introduced as part of certain level subcomplex in the above nested filtration sequence. Therefore, each simplex in can be associated with a unique weight referred to as the filtration weight of . In the results section, we present algorithm 2 and algorithm 3 which give the procedure to compute the filtration weights of simplices in the clique complex of a graph . Using an example network in figure 2, we also show that the persistent homology observed using the filtration based on the entire sequence of weights satisfying discrete Morse function is equivalent to that observed using the filtration based on the subsequence of critical weights.

Theorem 2.11 by Forman Forman (2002) can be stated as follows. Let represent the number of critical -simplices of dimension in a simplicial complex and be the -Betti number of . Then, for each , .

In other words, the above theorem gives a lower bound of the number of -critical simplices for each dimension as the -Betti number of . In results section, we present our algorithm 1 to assign weights satisfying discrete Morse function to simplices in the clique complex of a graph . Our choice of the function in algorithm 1 to assign weights to simplices in the clique complex tries to minimize the number of critical simplices (which has a lower bound given by Forman’s Theorem 2.11 Forman (2002)), and thus, reduces the number of filtration steps required to compute the persistent homology without loss of information. In the results section, we will show that our algorithm achieves near-optimal number of critical weights in clique complexes corresponding to many model and real networks analyzed here.

Comparing Persistence diagrams

Given a filtration of the clique complex of a graph (See equation 16), each -hole has a critical weight which corresponds to its birth index and which corresponds to its death index, with . Persistence diagrams Dgm for a -dimensional simplicial complex is the collection of points in whose first and second coordinates, and , respectively, correspond to the birth weight and death weight of a -hole of dimension Cohen-Steiner et al. (2007). The persistence of a -hole which has birth and death weights, and , respectively, is defined as . Thus, the persistence diagram for a clique complex corresponding to a graph is a compact representation of the persistent homology of a network.

Given two persistence diagrams and (which may correspond to two different networks), the -Wasserstein distance between and , also known as the bottleneck distance, is defined as follows Kerber et al. (2017):

(17)

Similarly, given two persistence diagrams and , the -Wasserstein distance between and is defined as follows Kerber et al. (2017):

(18)

In the above equations, ranges over all bijective maps from to , and given , is the norm. Note that it is not generally true that two persistence diagrams and have the same number of off-diagonal points, i.e., features with non-zero persistence, and we refer the readers to Kerber et al. Kerber et al. (2017) for details on circumventing this issue and further information regarding how the computation of the Wasserstein distance is reduced to a bipartite graph matching problem. In this work, we use Dionysus 2 package (http://www.mrzv.org/software/dionysus2/) to compute the Wasserstein distance between two persistence diagrams corresponding to two different model networks (See Results section). We remark that the bottleneck distance between two persistence diagrams which are subsets of the unit square is in the range 0 to 1.

Network datasets

Model networks. We have investigated the following models of unweighted and undirected networks, namely, the Erdös-Rényi (ER) Erdös and Rényi (1961), the Watts-Strogatz (WS) Watts and Strogatz (1998), the Barabási-Albert (BA) Barabási and Albert (1999) and the Hyperbolic Graph Generator (HGG) Krioukov et al. (2010). The ER model Erdös and Rényi (1961) is characterized by the property that the probability of the existence of each possible edge between any two vertices among the vertices in the graph is constant. The existence of edges in the ER model are independent of each other, and thus, the model produces random graphs with average vertex degree . The WS model Watts and Strogatz (1998) produces small-world graphs as follows. The WS model starts with an initial regular graph with vertices where each vertex is connected to its nearest neighbours. Next, the endpoint of each edge in the initial regular graph of the WS model is randomly chosen for rewiring based on a fixed rewiring probability and is rewired to another vertex in the graph which is chosen with uniform probability. The BA model Barabási and Albert (1999) produces scale-free graphs which are characterized by a degree distribution that follows a power law decay. The BA model utilizes a preferential-attachment scheme to produce scale-free graphs. The BA model generates an initial graph of vertices, and then, at each successive iteration a new vertex is added with edges to already existing vertices which are chosen with probability proportional to their degree at that particular iteration. The iterations in the BA model cease when the graph has attained the requisite number of vertices. The HGG model Krioukov et al. (2010); Aldecoa et al. (2015) produces a random graph of vertices by initially fixing vertices to points on a hyperbolic disk. In the HGG model, the probability of existence of an edge between two vertices is proportional to the hyperbolic distance between the two points on the hyperbolic disk that correspond to these two vertices. By tuning the input parameter , the HGG model can produce either a hyperbolic or a spherical random graph Krioukov et al. (2010); Aldecoa et al. (2015). Specifically, the HGG model produces hyperbolic random graphs for whereas spherical random graphs for .

Real networks. We have also studied seven real-world networks which are represented as unweighted and undirected graphs. We have considered two biological networks, namely, the Yeast protein interaction network Jeong et al. (2001) with 1870 vertices and 2277 edges, and the Human protein interaction network Rual et al. (2005) with 3133 vertices and 6726 edges. In both biological networks, each vertex represents a protein and an edge represents an interaction between the two proteins. We have considered two infrastructure networks including the US Power Grid network Leskovec et al. (2007) and the Euro road network Šubelj and Bajec (2011). In the US Power Grid network, the 4941 vertices represent the generators, transformers and substations in the Western states of USA and the 6594 edges represent power links between them. The 1174 vertices of the Euro road network correspond to cities in Europe and the 1417 edges correspond to roads linking the cities. We have also studied the Email network Guimera et al. (2003) of the University of Rovira i Virgili with 1133 vertices representing users and 5451 edges, each representing the existence of at least one Email communication between the two users corresponding to the vertices anchoring the edge. We have also studied the Route views network Leskovec et al. (2007) which has 6474 autonomous systems as vertices and 13895 edges representing communication between the systems that are represented as vertices. We have considered a social network, the Hamsterster friendship network Kunegis (2013), containing 1858 vertices which represent the users and 12534 edges which represent friendships between the users. Note that we omit self-loops while constructing the clique complex corresponding to the undirected graph of a real-world network.

Results and Discussion

Algorithm to construct discrete Morse function on a simplicial complex

From an unweighted and undirected graph with vertex set and edge set , it is straightforward to construct a clique simplicial complex with dimension (See Theory section). Figure 1 shows the construction of a clique complex starting from an example network. Given a simplicial complex , its dimension and a non-negative real-valued function on the 0-simplices of , the algorithm 1 assigns weights to any simplex in , producing a discrete Morse function defined in equation 12. In the pseudocode of the algorithm 1, lines 2-6 initialize a variable Flag for every simplex in clique complex with the value 0. We remark that the variable Flag associated with a simplex in serves as a counter for the size of the set defined in equation 10. Lines 7-9 assign weights to every -simplex in based on the input non-negative function . Lines 10-24 assign weights to - or higher-dimensional simplices in in a manner which is consistent with the definition in equation 12 of a discrete Morse function. In summary, the algorithm 1 outputs a discrete Morse function on and we now present a rigorous proof for the following theorem using results from Lemmas 1 to 5 described later.

1:function DiscreteMorseFunction()
2:
3:     for  do Initialize Flag variable associated with each simplex in
4:         for each -simplex  do
5:              Flag
6:         end for
7:     end for
8:
9:     for each -simplex  do Assign weights to -simplices in
10:         
11:     end for
12:
13:     for  do
14:         for each -simplex  do Assign weights to -simplices in with
15:              Let Faces[ ] be an array of all -dimensional faces of
16:              Sort Faces[ ] such that for each
17:              Let
18:              Let
19:              if  Flag and  then
20:                  
21:                  Flag
22:              else
23:                   random
24:                  
25:              end if
26:         end for
27:     end for
28:
29:     return
30:
31:end function
Algorithm 1 Algorithm to construct a discrete Morse function on a -dimensional simplicial complex

Theorem. Algorithm 1 produces a discrete Morse function on any simplicial complex of finite dimension .

Proof. Let denote a simplicial complex of dimension . Recall that for the function on which is constructed by algorithm 1, for each simplex , the two sets and were defined in equations 10 and 11 as follows:

To prove that is a discrete Morse function we need to show that for each simplex , both and (See equation 12).

Firstly, we show that for each simplex , . Consider a -simplex . Since, the dimension of a simplex cannot be less than , for each -simplex the corresponding set is empty. In other words, for each -simplex , we have that . Also, for each -simplex such that , Lemma 5 below shows that . Thus, for every simplex , we have shown that .

Secondly, we show that for each simplex , . For each -simplex such that , we prove in Lemma 4 below that . Now consider a -simplex . Since, by assumption is a -dimensional simplicial complex, there are no -simplices in , and thus, the set for each -simplex in is empty. In other words, for each -simplex , we have that . Thus, for every simplex we have shown that .

In summary, we have shown that for each simplex , and . Thus, satisfies the definition of a discrete Morse function on the simplicial complex (See equation 12).

We next prove the Lemmas used in the proof of the theorem above. This is done in the following sequence of Lemmas 1 to 5. We assume that is a -dimensional simplicial complex and is the output function on obtained from algorithm 1. We remark that a -dimensional simplex of a simplicial complex is denoted by . Also, if -simplex is a face of a -simplex then this is represented as in the sequel.

Lemma 1. For each where , if and such that , then, .

Proof. Let denote the -dimensional faces of such that . Note that is one such since by assumption is a -dimensional face of . Based on lines 11-23 in algorithm 1, we have that:

(19)

where .
Case implies .
Case implies .
Thus, for both cases we have that for each . Since is one such for some , we have .

Lemma 2. For each where , if and such that , then if and only if Flag changes value from to while assigning function value for .

Proof. Given and  such that  , we first assume . Let denote the -dimensional faces of such that . Then, the value is given by equation 19.
Case in equation 19 implies .
Case in equation 19 implies .

Since by assumption, is a face of and , Case is applicable, and we have equals . Thus, based on line 18 in algorithm 1, while assigning the function value on , Flag changes value from to .

Now, given and  such that  , we assume that Flag changes value from to while assigning function value for . Let denote the -dimensional faces of such that . Based on lines 11-23 in algorithm 1, Flag changes value from to while assigning function value for implies that equals . Thus, we have that .

Lemma 3. Let be a simplex of . Then, the number of times Flag changes value from to is .

Proof. Flag is initially set to in algorithm 1. In algorithm 1, if Flag transitions to 1, its value never changes. In other words, there is no procedure in our algorithm 1 which changes the Flag of a simplex from 1 to 0. Thus, either Flag remains 0 throughout or changes value from 0 to 1 exactly once in algorithm 1.

Lemma 4. For each where , if , then .

Proof. From Lemma 2, we have that for each , the number of -simplices such that and is equal to the number of times Flag changes value from to . Applying Lemma 3, we get that for each ,

Furthermore, Lemma 1 tells us that,

Thus, we have that for each = #.

Lemma 5. For each where , if , then .

Proof. Let denote the -dimensional faces of such that . Based on lines 11-23 in algorithm 1, we have that:

where .
Case implies , and thus, .
Case implies , and thus, .
Hence, for each with , we have that .

Filtration of the clique complex based on weights of critical simplices

Given a simplicial complex , its dimension and a discrete Morse function on , the algorithm 2 determines the weights of critical simplices in (See equation 13). In the pseudocode of the algorithm 2, lines 2-6 initialize a variable IsCritical associated to every simplex in clique complex to be True. Lines 7-20 determine the critical simplices in by checking for the condition in equation 13 which defines a critical simplex. Lines 21-31 determine the weights of critical simplices or critical weights in . Finally, the algorithm 2 outputs an array which contains an increasing sequence of critical weights in . Subsequently, this increasing sequence of critical weights will be used for the filtration of the clique complex .

1:function GetCriticalWeights()
2:
3:     for  do Initialize IsCritical variable associated with each simplex in
4:         for each -simplex  do
5:              IsCritical= True
6:         end for
7:     end for
8:
9:     for  do Determine the critical simplices in
10:         for each -simplex  do
11:              Let Faces[ ] be an array of all -dimensional faces of
12:              Sort Faces[ ] such that for each
13:              Let
14:              Let
15:              if   then
16:                  IsCritical[ False
17:                  if  IsCritical = True then
18:                       IsCritical False
19:                  end if
20:              end if
21:         end for
22:     end for
23:
24:     Initialize
25:     Declare empty array
26:     for  do Determine the weights of critical simplices in
27:         for each -simplex  do
28:              if  IsCritical = True then
29:                  
30:                  
31:              end if
32:         end for
33:     end for
34:     Sort array in increasing order and remove duplicates
35:
36:     return
37:
38:end function
Algorithm 2 Algorithm to compute the weights of critical simplices in

Given an unweighted and undirected graph , we restrict the construction of clique complex by including simplices up to a maximum dimension . Then, the algorithm 3 creates the filtration of clique complex based on weights of critical simplices as described in the Theory section. In the pseudocode of the algorithm 3, lines 2-6 assigns a non-negative function to -simplices in clique complex . Line 7 calls the algorithm 1 for the assignment of weights satisfying discrete Morse function to every simplex in . Line 8 calls the algorithm 2 to obtain an increasing sequence of unique weights corresponding to critical simplices in . Lines 9-11 initialize a variable IsAdded associated to every simplex in which tracks whether the simplex has been added to the filtration or not. Lines 12-25 compute the filtration weight of each simplex in as described in the Theory section.

1:
2:Create clique complex of graph by restricting to a maximum dimension
3:
4: Maximum degree of a vertex in graph
5:for  each -simplex  do Assign non-negative function to -simplices in
6:      random
7:      = - degree() +
8:end for
9:
10: = DisceteMorseFunction() Call Algorithm 1
11:[ ] = GetCriticalWeights() Call Algorithm 2
12:
13:for each simplex  do Initialize IsAdded variable associated with each simplex in
14:     IsAdded
15:end for
16:
17:for  do Calculate Filtration weight for each simplex in
18:     for each simplex  do
19:         if  AND IsAdded  then
20:              FiltrationWeight
21:              IsAdded
22:              for each face  do
23:                  if  IsAdded then
24:                       FiltrationWeight[] =
25:                       IsAdded
26:                  end if
27:              end for
28:         end if
29:     end for
30:end for
31:
Algorithm 3 Algorithm to create the filtration of a clique complex

Rationale for the choice of function on vertices

In order to construct a discrete Morse function on clique complex corresponding to a graph using our algorithm 1, a real-valued function has to be fixed on the -simplices of (See lines 7-9 in algorithm 1). Let denote the maximum degree of a vertex in the graph . Our choice for the function value on the vertices or -simplices, , is as follows:

(20)

where degree is the degree of the vertex and is a random number (noise) chosen uniformly in the range . See lines 2-6 in algorithm 3 and lines 7-9 in algorithm 1.

In the Theory section, we had highlighted the Theorem 2.11 by Forman Forman (2002) which gives a lower bound on the number of critical -simplices, , in a simplicial complex as the -Betti number . The choice of the real-valued function in algorithm 1 plays a key role in determining if the number of critical -simplices in is close to the theoretical minimum stated above. In the Theory section, we have shown that the number of critical simplices determines the effective number of filtration weights to study the persistent homology of a clique complex (See equation 16). This motivated our choice for the real-valued function (equation 20) which determines the weights of -simplices, and the rationale for this choice is as follows.

Ignoring the noise term in equation 20, the reader can discern our intuition for choosing the function for any vertex in with the following example. Consider the simple example of the clique complex corresponding to a graph in figure 1. Here, we would like to obtain a discrete Morse function on such that the number of critical simplices is close to the theoretical minimum. This requirement applies to simplices of any dimension in , and in the context of this example, we would like the number of critical -simplices (edges) to be as close as possible to the -Betti number of . Note that for the example clique complex in figure 1.

Let us now examine lines 11-23 in algorithm 1. Consider any edge such that . While assigning the function value to the edge in algorithm 1, the edge and the vertex are guaranteed to be not critical provided that the if condition in the line 16 is satisfied. This is a consequence of the definition of a critical simplex (See equation 13). Thus, we would like to force this if condition to be True for as many edges as possible. Moreover, once the function value of the -simplex is set, we set the variable Flag to 1 in line 18, and this subsequently forces the if condition in the line 16 to fail for all other edges in the graph that contain and have function value .

Let us now examine the edge in figure 1 which is anchored by vertices and with degree 5 and 1, respectively. As the degree of a vertex gives the number of edges that contain the vertex, is part of 4 other edges while is part of only the edge . Suppose is the first edge chosen for the function assignment in line 11 of algorithm 1 and both Flag[] and Flag[] for the anchoring vertices are . As is part of 4 other edges while is part of only , we would then prefer that the if condition in the line 16 is satisfied for , and furthermore, Flag[] is set to 1 instead of Flag[], in other words, we need the function value . We emphasize that this choice of over prevents the forced failure (described in the previous paragraph) of the if condition for the 4 other edges that contain .

The above example suggests a need for a function on the vertices that has an inverse relationship with the degree of the vertices. For instance, if the function on vertices was chosen to be , then every edge except containing would become critical, since has the maximum degree in the example network. Hence, our choice provides a simple and effective solution for the above requirement.

We now provide a rationale for the addition of a random noise in equation 20. As reasoned above, we would like to force the if condition in the line 16 of algorithm 1 to be True for as many edges as possible. Consider any edge such that . Then, irrespective of the state of Flag and Flag, the if condition fails. Hence, we would like while also retaining the inverse relationship of the function with the degree. Adding a small random noise in the range provides a simple resolution (See equation 20). We remark that the above argument can be generalized to higher-dimensional simplices, and thus, provides the intuition for the addition of noise in line 21 of algorithm 1.

We remind the readers that our initial motivation was not to develop a scheme to construct the optimal discrete Morse function on a clique complex corresponding to a graph. Rather, our main goal is to develop a systematic filtration scheme to study persistent homology in unweighted and undirected networks. In fact, constructing an optimal discrete Morse function in the general case has been shown to be MAX-SNP Hard Lewiner et al. (2003). The primary utility of our scheme is to create a filtration by assigning weights to simplices in the clique complex of a graph . However, we next report our empirical results from an exploration of model and real-world networks which underscore the following. Although our scheme is not optimal in the sense of minimizing the number of critical simplices, in practice, it achieves near-optimal results in several model and real-world networks (Table 1). Hence, our scheme based on discrete Morse theory reduces the number of filtration steps and increases the applicability of persistent homology to study complex networks.

Figure 3: Barcode diagrams for and in model networks. (a) ER model with and . (b) WS model with , and . (c) BA model with and . (d) Spherical random graphs produced from HGG model with and . (e) Hyperbolic random graphs produced from HGG model with and .
Figure 4: Barcode diagrams for and in real networks. (a) US Power Grid. (b) Email communication. (c) Route views. (d) Yeast protein interaction. (e) Hamsterster friendship.
Figure 5: Barcode diagrams for in model and real networks. (a) Spherical random graphs produced from HGG model with and . (b) Hyperbolic random graphs produced from HGG model with and . (c) US Power Grid. (d) Email communication. (e) Route views. (f) Yeast protein interaction. (g) Hamsterster friendship.
Figure 6: Bottleneck distance between persistent diagrams of model networks, namely, ER model with and , WS model with , and , BA model with and , Spherical random graphs produced from HGG model with and , and Hyperbolic random graphs produced from HGG model with and . For each of the five model networks, 10 random samples are generated by fixing the number of vertices and other parameters of the model. We report the distance between two different models as the average of the distance between each of the possible pairs of the 10 sample networks corresponding to the two models along with the standard error.
Network