An Axiomatic and an Average-Case Analysis of Algorithms and Heuristics for Metric Properties of Graphs full
In recent years, researchers proposed several algorithms that compute metric quantities of real-world complex networks, and that are very efficient in practice, although there is no worst-case guarantee.
In this work, we propose an axiomatic framework to analyze the performances of these algorithms, by proving that they are efficient on the class of graphs satisfying certain properties. Furthermore, we prove that these properties are verified asymptotically almost surely by several probabilistic models that generate power law random graphs, such as the Configuration Model, the Chung-Lu model, and the Norros-Reittu model. Thus, our results imply average-case analyses in these models.
For example, in our framework, existing algorithms can compute the diameter and the radius of a graph in subquadratic time, and sometimes even in time . Moreover, in some regimes, it is possible to compute the most central vertices according to closeness centrality in subquadratic time, and to design a distance oracle with sublinear query time and subquadratic space occupancy.
In the worst case, it is impossible to obtain comparable results for any of these problems, unless widely-believed conjectures are false.
We study problems motivated by network analysis, such as computing the diameter of a graph, the radius, the closeness centrality, and so on. All these problems admit polynomial-time algorithms, based on computing the distance between all pairs of vertices. These algorithms, however, do not terminate in reasonable time if the input is a real-world graph with millions of nodes and edges. Such worst-case inefficiency is probably due to complexity-theoretic bottlenecks: indeed, a faster algorithm for any of these problems would falsify widely believed conjectures [?].
In practice, these problems are solved via heuristics and algorithms that do not offer any performance guarantee, apart from empirical evidence. These algorithms are widely deployed, and they are implemented in major graph libraries, like Sagemath [?], Webgraph [?], NetworKit [?], and SNAP [?].
In this work, we develop a theoretical framework in which these algorithms can be evaluated and compared. Our framework is axiomatic in the sense that we define some properties, we experimentally show that these properties hold in most real-world graphs, and we perform a worst-case analysis on the class of graphs satisfying these properties. The purpose of this analysis is threefold: we validate the efficiency of the algorithms considered, we highlight the properties of the input graphs that are exploited, and we perform a comparison that does not depend on the specific dataset used for the evaluation. A further confirmation of the validity of this approach comes from the results obtained, that are very similar to existing empirical results.
Furthermore, we show that the properties are verified on some models of random graphs, asymptotically almost surely (a.a.s.), that is, with probability that tends to as the number of nodes goes to infinity: as a consequence, all results can be turned into average-case analyses on these models, with no modification. This modular approach to average-case complexity analysis has two advantages: since our properties are verified by different models, we can prove results in all these models with a single worst-case analysis. Furthermore, we clearly highlight which properties of random graphs we are using: this way, we can experimentally validate the choice of the probabilistic model, by showing that these properties are reflected by real-world graphs.
In the past, most average-case analyses were performed on the Erdös-Renyi model, which is defined by fixing the number of nodes, and connecting each pair of nodes with probability [?]. However many algorithms that work well in practice have poor average-case running time on this model.
Our approach is based on four properties: one simply says that the degree distribution is power law, and the other three study the behavior of , which is defined as the smallest integer such that the number of vertices at distance from is at least . The first of these properties describes the typical and extremal behavior of , where ranges over all vertices in the graph. The next two properties link the distance between two vertices and with : informally, is close to . We prove that these properties are verified in the aforementioned graph models.
The definition of these properties is one of the main technical contributions of this work: they do not only validate our approach, but they also provide a very simple way of proving other metric properties of random graphs, and their use naturally extends to other applications. Indeed, the proof of our probabilistic analysis is very simple, when one assumes these properties. On the other hand, the proof of the properties is very technical, and it uses different techniques in the regimes , and . In the regime , the main technical tool used is branching processes: it is well-known that the size of neighborhoods of a given vertex in a random graph resembles a branching process [?], but this tool was almost always used either as an intuition [?] (and different techniques were used in the actual proof), or it was applied only for specific models, such as the Norros-Reittu model [?]. Conversely, in this work, we provide a quantitative result, from which we deduce the proof of the properties. In the regime , the branching process approximation does not hold anymore (indeed, the distribution of the branching process is not even defined). For this reason, in the past, very few results were obtained in this case [?]. In this work, we overcome this difficulty through a different technique: we prove that the graph contains a very dense “core” made by the nodes with highest degree, and the distance between two nodes is almost always the length of a shortest path from to the core, and from the core to . This technique lets us compute the exact value of the diameter, and it lets us prove that the asymptotics found in [?] for the Configuration Model also hold in other models.
Assuming the four properties, we can easily prove consequences on the main metric properties of the graphs under consideration: we start by estimating the eccentricity of a given vertex , which is defined as . From this result, we can estimate the diameter . Similarly, we can estimate the farness of , that is, , the closeness centrality of , which is defined as , and the average distance between two nodes. By specializing these results to the random graph models considered, we retrieve known asymptotics for these quantities, and we prove some new asymptotics in the regime .
After proving these results, we turn our attention to the analysis of many heuristics and algorithms, by proving all the results in (a plot of the results is available in ).
- The poor performances of some of these algorithms in the Erdös-Renyi model were empirically shown in [?], and they can be proved with a simple adaptation of the analysis in this paper.
- Some of the results contain a value : this value comes from the four properties, which depend on a parameter . In random graphs, this notation is formally correct: indeed, we can let tend to , since the properties are satisfied a.a.s. for each . In real-world graphs, we experimentally show that these properties are verified for small values of , and with abuse of notation we write to denote a function bounded by , for some constant .