Multivariate Analysis of Orthogonal Range Searching and Graph Distances Parameterized by Treewidth

Multivariate Analysis of Orthogonal Range Searching and Graph Distances Parameterized by Treewidth

Karl Bringmann    Thore Husfeldt    Måns Magnusson
Abstract

We show that the eccentricities, diameter, radius, and Wiener index of an undirected -vertex graph with nonnegative edge lengths can be computed in time , where is the treewidth of the graph. For every , this bound is , which matches a hardness result of Abboud, Vassilevska Williams, and Wang (SODA 2015) and closes an open problem in the multivariate analysis of polynomial-time computation. To this end, we show that the analysis of an algorithm of Cabello and Knauer (Comp. Geom., 2009) in the regime of non-constant treewidth can be improved by revisiting the analysis of orthogonal range searching, improving bounds of the form to , as originally observed by Monier (J. Alg. 1980).

We also investigate the parameterization by vertex cover number.

Diameter, radius, Wiener index, orthogonal range searching, treewidth, vertex cover number.
\hideLIPIcs

Max-Planck-Institute for Informatics, Saarland Informatics Campus, Saarbrücken, Germanykbringma@mpi-inf.mpg.de BARC, IT University of Copenhagen, Denmark, and Lund University, Sweden.thore@itu.dkhttps://orcid.org/0000-0001-9078-4512Swedish Research Council grant VR-2016-03855 and Villum Foundation grant 16582.Department of Computer Science, Lund University, Swedenmans.magnusson.888@student.lu.se \CopyrightKarl Bringmann and Thore Husfeldt and Måns Magnusson\subjclassTheory of computation Shortest paths, Parameterized complexity and exact algorithms, Computational geometry. Mathematics of computing Paths and connectivity problems.\category\relatedversion\supplement\funding

Acknowledgements.
We thank Amir Abboud and Rasmus Pagh for useful discussions.\EventEditorsJohn Q. Open and Joan R. Access \EventNoEds2 \EventLongTitle42nd Conference on Very Important Topics (CVIT 2016) \EventShortTitleCVIT 2016 \EventAcronymCVIT \EventYear2016 \EventDateDecember 24–27, 2016 \EventLocationLittle Whinging, United Kingdom \EventLogo \SeriesVolume42 \ArticleNo23 \pdfstringdefDisableCommands\pdfstringdefDisableCommands

1 Introduction

Pairwise distances in an undirected, unweighted graph can be computed by performing a graph exploration, such as breadth-first search, from every vertex. This straightforward procedure determines the diameter of a given graph with vertices and edges in time . It is surprisingly difficult to improve upon this idea in general. In fact, Roditty and Vassilevska Williams [RV] have shown that an algorithm that can distinguish between diameter and in an undirected sparse graph in subquadratic time refutes the Orthogonal Vectors conjecture.

However, for very sparse graphs, the running time becomes linear. In particular, the diameter of a tree can be computed in linear time by a folklore result that traverses the graph twice. In fact, an algorithm by Cabello and Knauer shows that for constant treewidth , the diameter (and other distance parameters) can be computed in time , where the Landau symbol absorbs the dependency on as well as the time required for computing a tree decomposition. The question raised in [AVW] is how the complexity of this problem grows with the treewidth of the graph. We show the following result:

{theorem}

The eccentricities, diameter, radius, and Wiener index of a given undirected -vertex graph of treewidth and nonnegative edge lengths can be computed in time linear in

 n⋅(k+⌈logn⌉k)⋅2kk2logn (1)

where .

For every , the bound (1) is . This improves the dependency on the treewidth over the running time of Abboud, Vassilevska Williams, and Wang [AVW]. Our improvement is tight in the following sense. Abboud et al. [AVW] also showed that under the Strong Exponential Time Hypothesis of Impagliazzo, Paturi, and Zane [IPZ], there can be no algorithm that computes the diameter with running time

 n2−δexpo(tw(G))for any δ>0. (2)

In fact, this holds under the potentially weaker Orthogonal Vectors conjecture, see [VassW15] for an introduction to these arguments. Thus, under this assumption, the dependency on in Theorem 1 cannot be significantly improved, even if the dependency on is relaxed from just above linear to just below quadratic. Our analysis encompasses the Wiener index, an important structural graph parameter left unexplored by [AVW].

Perhaps surprisingly, the main insight needed to establish Theorem 1 has nothing to do with graph distances or treewidth. Instead, we make—or re-discover—the following observation about the running time of -dimensional range trees:

{lemma}

[[Monier79]] A -dimensional range tree over points supporting orthogonal range queries for the aggregate value over a commutative monoid has query time and can be built in time , where

 B(n,d)=(d+⌈logn⌉d).

This is a more careful statement than the standard textbook analysis, which gives the query time as and the construction time as . For many values of , the asymptotic complexities of these bounds agree—in particular, this is true for constant and for very large , which are the main regimes of interest to computational geometers. But crucially, is always for any , while is not.

After Lemma 1 is realised, Theorem 1 follows via divide-and-conquer in decomposable graphs, closely following the idea of Cabello and Knauer [CK] and augmented with known arguments [AVW, BDDFLP]. We choose to give a careful presentation of the entire construction, as some of the analysis is quite fragile.

Using known reductions, this implies that the following multivariate lower bound on orthogonal range searching is tight:

{theorem}

[Implicit in [AVW]] A data structure for the orthogonal range query problem for the monoid with construction time and query time , where

 q′(n,d)=n1−ϵexpo(d)

for some , refutes the Strong Exponential Time hypothesis.

We also investigate the same problems parameterized by vertex cover number:

{theorem}

The eccentricities, diameter, and radius of a given undirected, unweighted -vertex graph with vertex cover number can be computed in time . The Wiener index can be computed in time .

Both of these bounds are . It follows from [AVW] that a lower bound of the form (2) holds for this parameter as well.

1.1 Related work

Abboud et al. [AVW] show that given a graph and an optimal tree decomposition, various graph distances can be computed in time , where . This bound is for any . This subsumes the running time for finding an approximate tree decomposition with from the input graph [BDDFLP], which is .

If the diameter in the input graph is constant, the diameter can be computed in time [H17]. This is tight in both parameters in the sense that [AVW] rules out the running time (2) even for distinguishing diameter 2 from 3, and every algorithm needs to inspect vertices even for treewidth 1. For non-constant diameter , the bound from [H17] deteriorates as . However, the construction cannot be used to compute the Wiener index.

The literature on algorithms for graph distance parameters such as diameter or Wiener index is very rich, and we refer to the introduction of [AVW] for an overview of results directly relating to the present work. A recent paper by Bentert and Nichterlein [BN] gives a comprehensive overview of many other parameterisations.

Orthogonal range searching using a multidimensional range tree was first described by Bentley [Bentley80], Lueker [Lueker78], Willard [Willard], and Lee and Wong [LeeW80], who showed that this data structure supports query time and construction time . Several papers have improved this in various ways by factors logarithmic in ; for instance, Chazelle’s construction [Chazelle90a] achieves query time .

1.2 Discussion

In hindsight, the present result is a somewhat undramatic resolution of an open problem in that has been viewed as potentially fruitful by many people [AVW], including the second author [H17]. In particular, the resolution has led neither to an exciting new technique for showing conditional lower bounds of the form , nor a clever new algorithm for graph diameter. Instead, our solution follows the ideas of Cabello and Knauer [CK] for constant treewidth, much like in [AVW]. All that was needed was a better understanding of the asymptotics of bivariate functions, rediscovering a 40-year old analysis of spatial data structures [Monier79] (see the discussion in Sec. 3.3), and using a recent algorithm for approximate tree decompositions [BDDFLP].

Of course, we can derive some satisfaction from the presentation of asymptotically tight bounds for fundamental graph parameters under a well-studied parameterization. In particular, the surprisingly elegant reductions in [AVW] cannot be improved. However, as we show in the appendix, when we parameterize by vertex cover number instead of treewidth, we can establish even cleaner and tight bounds without much effort.

Instead, the conceptual value of the present work may be in applying the multivariate perspective on high-dimensional computational geometry, reviving an overlooked analysis for non-constant dimension. To see the difference in perspective, Chazelle’s improvement [Chazelle90a] of -dimensional range queries from to makes a lot of sense for small , but from the multivariate point of view, both bounds are . The range of relationships between and where the multivariate perspective on range trees gives some new insight is when is asymptotically just shy of , see Sec. 2.1.

It remains open to find an algorithm for diameter with running time , or an argument that such an algorithm is unlikely to exist under standard hypotheses. This requires better understanding of the regime .

2 Preliminaries

2.1 Asymptotics

We summarise the asymptotic relationships between various functions appearing in the present paper:

{lemma}
 B(n,d)=O(logdn). (3)

For any ,

 B(n,d)=nϵexpO(d), (4) logdn=nϵexpΩ(dlogd), (5) logdn=nϵexpO(dlogd). (6)

The first expression shows that is always at least as informative as . The next two expressions show that from the perspective of parameterised complexity, the two bounds differ asymptotically: depends single-exponentially on (no matter how small is chosen), while does not (no matter how large is chosen). Expression (6) just shows that (5) is maximally pessimistic.

Proof.

Write . To see (3), consider first the case where . Using we see that

 (d+hd)≤(2hd)≤(2h)dd!=2dd!hd=O(logdn). (7)

Next, if then

 (d+hd)=(d+hh)≤(2dh)=2hh!dh≤dh,

provided . It remains to observe that . Ineed, since the function is increasing for , we have , which implies as needed.

For (4), there are two cases. First assume for all . From Stirling’s formula we know , so

 (d+hd)<((1+ϵ)hϵh)<(e(1+ϵ)hϵh)ϵh<(e(1+ϵ)ϵ)2ϵlogn=n2ϵloge(1+ϵ)ϵ−1=no(1),

where the last expression uses that is a monotone increasing function in the interval .

On the other hand, if for some constant , we have

 (d+hd)≤((1+1/c)dd)<2(1+1/c)d=expO(d).

We turn to (5). Assume that there is a function such that

 logdn=ncg(d).

Then choose and consider such that Then

 g(d)≥logdnnc=2dloglogn−clogn=2dlog(bd)−cbd=expΩ(dlogd).

Finally for (6), we repeat the argument from [AVW]. If then In particular, if then . Moreover, for we have and thus

These calculations also show the regimes in which these considerations are at all interesting. For then both functions are bounded by , and the multivariate perspective gives no insight. For , both bounds exceed , and we are better off running BFSs for computing diameters, or passing through the entire point set for range searching.

2.2 Model of computation

We operate in the word RAM, assuming constant-time arithmetic operations on coordinates and edge lengths, as well as constant-time operations in the monoid supported by our range queries. For ease of presentation, edge lengths are assumed to be nonnegative integers; we could work with abstract nonnegative weights instead [CK].

3 Orthogonal Range Queries

3.1 Preliminaries

Let be a set of -dimensional points. We will view as a vector .

A commutative monoid is a set with an associative and commutative binary operator with identity. The reader is invited to think of as the integers with as identity and .

Let be a function and define for each subset

 f(Q)=⨁{f(q):q∈Q}

with the understanding that is the identity in .

3.2 Range Trees

Consider dimension and enumerate the points in as such that , for instance by ordering after the th coordinate and breaking ties lexicographically. Define to be the median point , and similarly the and . Set

 QL={q(1),…,q(⌈r/2⌉)},QR={q(1+⌈r/2⌉),…,q(r)}. (8)

For , the range tree for is a node with the following attributes:

• , a reference to the range tree , often called the left child of .

• , a reference to the range tree , often called the right child of .

• , a reference to the range tree , often called the secondary, associate, or higher-dimensional structure. This attribute only exists for .

• .

• .

• . This attribute only exists for .

Construction

Constructing a range tree for is a straightforward recursive procedure:

Algorithm C (Construction). Given integer and a list of points, this algorithm constructs the range tree with root .

C1

[Base case .] Recursively construct if , otherwise set . Set . Return .

C2

[Find median.] Determine , , .

C3

[Split .] Let and as given by (8), note that both are nonempty.

C4

[Recurse.] Recursively construct from . Recursively construct from . If then recursively construct . If then set .

The data structure can be viewed as a collection of binary trees whose nodes represent various subsets of the original point set . In the interest of analysis, we now introduce a scheme for naming the individual nodes , and thereby also the subsets . Each node is identified by a string of letters from as follows. Associate with a set of points, often called the canonical subset of , as follows. For the empty string we set . In general, if then , and . The strings over can be understood as uniquely describing a path through in the data structure; for instance, L means ‘go left, i.e., to the left subtree, the one stored at ’ and D means ‘go to the next dimension, i.e., to the subtree stored at .’ The name of a node now describes the unique path that reaches it.

{lemma}

Let . Algorithm C computes the -dimensional range tree for in time linear in .

Proof.

We run Algorithm C on input and .

Disregarding the recursive calls, the running time of algorithm C on input and is dominated by Steps C2 and C3, i.e., splitting into two sets of equal size. It is known that this task can be performed in time linear in [Blum]. Thus, the running time for constructing is linear in plus the time spent in recursive calls.

This means that we can bound the running time for constructing by bounding sizes of the sets associated with every node in the data structure. If for a moment denotes the set of all these nodes then we want to bound

 ∑x∈X|Px|=∑x∈X|{p∈P:p∈Px}|=∑p∈P|{x∈X:p∈Px}|.

Thus, we need to determine, for given , the number of subsets in which appears. By construction, there are fewer than occurrences of D in . Moreover, if contains more than occurrences of either L or R then is empty. Thus, has at most letters. For two different strings and that agree on the positions of D, the sets and are disjoint, so appears in at most one of them. We conclude that the number of sets such that is bounded by the number of ways to arrange fewer than many Ds and at most non-Ds. Using the identity repeatedly, we compute

 d−1∑i=0h∑j=0(i+jj)=d−1∑i=0(i+h+1h)=d−1∑i=0(i+h+1i+1)=(−1)+d∑i=0(i+hi)=(h+d+1d)−1=h+d+1h+1(h+dd)−1≤d(d+hd).

The bound follows from aggregating this contribution over all . ∎

Search.

In this section, we fix two sequences of integers and describing the query box given by

 B=[l1,r1]×⋯×[ld,rd].

Algorithm Q (Query). Given integer , a query box as above and a range tree with root for a set of points such that every point satisfies for . This algorithm returns .

Q1

[Empty?] If the data structure is empty, or , or , then return the identity in the underlying monoid .

Q2

[Done?] If and and then return .

Q3

[Next dimension?] If and and then query the range tree at for dimension . Return the resulting value.

Q4

[Split.] Query the range tree for dimension ; the result is a value . Query the range tree for dimension ; the result is a value . Return .∎

To prove correctness, we show that this algorithm is correct for each point set .

{lemma}

Let , where is the number of Ds in . Assume that is such that for all for each . Then the query algorithm on input and returns .

Proof.

Backwards induction in .

If then is the empty set, in which case the algorithm correctly returns the identity in .

If the algorithm executes Step Q2 then is satisfied for all , in which case the algorithm correctly returns .

If the algorithm executes Step Q3 then satisfies the condition in the lemma for , and the number of Ds in is , and store the th range tree for . Thus, by induction the algorithm returns , which equals because .

Otherwise, by induction, and . Since , we have . ∎

{lemma}

If is the root of the range tree for then on input , , and , the query algorithm returns in time linear in .

Proof.

Correctness follows from the previous lemma.

For the running time, we first observe that the query algorithm does constant work in each visited node. Thus it suffices to bound the number of visited nodes as

 2d(h+dd)(d≥1,h≥0). (9)

We will show by induction in that (9) holds for every call to a -dimensional range tree for a point set , where . The two easy cases are Q1 and Q2, which incur no additional nodes to be visited, so the number of visited nodes is , which is bounded by (9). Step Q3 leads to a recursive call for a -dimensional range tree over the same point set , and we verify

 1+2d−1(h+d−1d−1)≤2d(h+dd).

The interesting case is Step Q4. We need to follow two paths from to the leaves of the binary tree of . Consider the leaves and in the subtree rooted at associated with the points and as defined in Sec. 3.2. We describe the situation of the path from to ; the other case is symmetrical. At each internal node , the algorithm chooses Step Q4 (because ). There are two cases for what happens at and . If then satisfies , so the call to will choose Step Q3. By induction, this incurs visits, where is the height of . In the other case, the call to will choose Step Q1, which incurs no extra visits. Thus, the number of nodes visited on the left path is at most

 h+h−1∑i=02d−1(d−1+id−1),

and the total number of nodes visited is at most twice that:

 2h+2dh−1∑i=0(d−1+id−1)≤2dh∑i=0(d−1+id−1)=2d(d+hd).

3.3 Discussion

The textbook analysis of range trees, and similar -dimensional spatial algorithms and data structures sets up a recurrence relation like

 r(n,d)=2r(n/2,d)+r(n,d−1),

for the construction and

 r(n,d)=max{r(n/2,d),r(n,d−1)},

for the query time. One then observes that and are the solutions to these recurrences. This analysis goes back to Bentley’s original paper [Bentley80].

Along the lines of the previous section, one can show that the functions and solve these recurrences as well. A detailed derivation can be found in [Monier79], which also contains combinatorial arguments of how to interpret the binomial coefficients in the context of spatial data structures. A later paper of Chan [Chan08] also takes the recurrences as a starting point, and observes asymptotically improved solution for the related question of dominance queries.

4 Graph Distances

We present the algorithm for computing the diameter. The construction closely follows Cabello and Knauer [CK], but uses the range tree bounds from Section 3. The analysis is extended to superconstant dimension as in Abboud et al. [AVW]. Using the approximate treewidth construction of Bodlaender et al. [BDDFLP], we can pay more attention to the parameters of the recursive decomposition into small-size separators.

4.1 Preliminaries

We consider an undirected graph with vertices and edges with nonnegative integer weights. The set of vertices is . For a vertex subset we write for the induced subgraph.

A path from to is called a -path and denoted . For we use the notation for the subpath starting in . The length of a path, denoted , is the sum of its edge lengths.

The distance from vertex to vertex , denoted , is the minimum length of shortest -path. The Wiener index of , denoted is . The eccentricity of a vertex , denoted is given by . The diameter of , denoted is . The radius of , denoted is .

4.2 Separation

A skew -separator tree of is a binary tree such that each node of is associated with a vertex set such that

• ,

• If denote the vertices of associated with the left and right subtrees of , respectively, then separates and and

 nk+1≤|Lt∪Zt|≤nkk+1, (10)
• remains a skew -separator even if edges between vertices of are added.

It is known that such a tree can be found from a tree decomposition, and an approximate tree decomposition can be found in single-exponential time. We summarise these results in the following lemma:

{lemma}

[[CK, Lemma 3] with [BDDFLP, Theorem 1]] For a given -vertex input graph , a skew -separator tree can be computed in time .

4.3 Algorithm

Given graph , let denote the set of shortest paths. Let denote the distance from to any vertex in . Formally,

 e(x;W)=max{l(xPw):xPw∈S,w∈W}.

The central idea of the algorithm, following [CK], is the computation for , of -visiting eccentricities defined as follows. Enumerate . Then define, for , the value as the maximum distance from to over all such that some shortest -path contains but no shortest -path contains any of . Formally,

 e(x,zi;Y)= maxl(zPy) such that y∈Y, xPy∈S, Z∩V(xPy)∋zi, {z1,…,zi−1}∩V(xQy)=∅ for all zQy∈S.

See Figure 3 for a small example.

This definition ensures that in situations where and are connected by two shortest paths of the form and with , then exactly one of them contributes to and . This is important for avoiding over-counting in Section 4.5.

{lemma}

For , .

The proof is in Appendix B. The connection to orthogonal range queries is the following. Enumerate . A shortest path attaining the distance maximises over all , where is such that for all ,

 d(x,zi)+d(zi,y)

equivalently,

 d(x,zi)−d(x,zj)

We are ready for the algorithm, which closely follows [CK]:

Algorithm E (Eccentricities). Given a graph and a skew -separator tree with root , this algorithm computes the eccentricity of every vertex . We write , , and .

E1

[Base case.] If then find all distances using Dijkstra’s algorithm and terminate.

E2

[Distances from separator.] Compute for every using applications of Dijkstra’s algorithm.

E3

[Add shortcuts.] For each pair , add the edge to , weighted by . Remove duplicate edges, retaining the shortest.

E4.1

[Start iterating over .] Let .

E4.2

[Build range tree for .] Construct a -dimensional range tree of points given by where

 pj=d(zi,y)−d(zj,y)j∈{1,…,k}

and using the monoid .

E4.3

[Query range tree.] For each , query with and

 rj={d(x,zi)−d(x,zj)−1,(j

The result is .

E4.4

[Next .] If then increase and go to E4.1.

E5

[Recurse on .] Recursively compute the distances in using the left subtree of as a skew -separator tree. The result are eccentricities for each . For each , set , then set . Set for .

E6

[Flip.] Repeat Steps E4–5 with the roles of and exchanged.

4.4 Running Time

{lemma}

The running time of Algorithm E is .

The proof is in Appendix C. We can now establish Theorem 1 for diameter and radius.

Proof of Thm. 1, distances.

To compute all eccentricities for a given graph we find a -skew separator for using Lemma 4.2 in time . We then run Algorithm E, using Lemma 4.4 to bound the running time. From the eccentricities, the radius and diameter can be computed in linear time using their definition. ∎

4.5 Wiener Index

Algorithm E can be modified to compute the Wiener index, as described in [CK, Sec. 4], completing the proof of Theorem 1. The main observation is that the sum of distances between all pair can be written as pairwise distances within , within , and between and , carefully subtracting contributions from these sums that were included twice.

The orthogonal range queries for vertex now need to report the sum of distances to every , rather than just the value of the maximum distance . To this end, we use the monoid of positive integer tuples with the operation

 (d,r)⊕(d′,r′)=(d+d′,r+r′)

with identity element . The value associated with vertex in Step E4.2 is .

We also observe the matching lower bound:

{theorem}

An algorithm for computing the Wiener index in time time for any refutes the Orthogonal Vector conjecture.

Proof.

The diameter of is if and only if . Thus, an algorithm for Wiener index is able to distinguish input graphs of diameter 2 and 3. This problem was shown hard in [AVW]. ∎

Appendix A Parameterization by Vertex Cover Number

We show Theorem 1.

A vertex cover is a vertex subset of such that every edge in has at least one endpoint in . The smallest for which a vertex cover of size exists is the vertex cover number of a graph, denoted . The number of edges in a graph is at most .

a.1 Eccentricities and Wiener Index

A graph with vertex cover number is a star, and its pairwise distances can be determined from the input size. It follows from [AVW] that the complexity of computing the diameter must depend exponentially on , in the same way as for . We observe here that algorithms that match this lower bound are quite immediate. The idea is that each has its entire neighbourhood , defined as

 N(v)={u∈V(G):uv∈E(G)},

contained in . Thus, all paths from have their second vertex in . In particular, two vertices and with have the same distances to the rest of the graph. Since is suffices to consider all many subsets of . The details are given in Algorithm V.

Algorithm V (Eccentricities Parameterized by Vertex Cover). Given a connected, unweighted, undirected graph and a vertex cover , this algorithm computes the eccentricity of each vertex and the Wiener index.

V1

[Initialise.] Set . Insert each into a dictionary indexed by .

V2

[Distances from .] For each , perform a breadth-first search from in , computing for all . Let and increase by .

V3

[Distances from ] Choose any . Perform a breadth-first search from in , computing for all . For each with (including itself), let , increase by , and remove from . Repeat step V3 until is empty.

{theorem}

The eccentricities and Wiener index of an unweighted, undirected, connected graph with edges and vertex cover number can be computed in time . Any algorithm with running time would refute the Strong Exponential Time Hypothesis.

Proof.

It is well known that a minimum vertex cover can be