Fast Approximation in Subspaces by Doubling Metric Decomposition This work was partially supported by the Polish Ministry of Science grant N206 355636. E-mail addresses: {cygan,kowalik,mucha,malcin,sank}@mimuw.edu.pl.

Fast Approximation in Subspaces by Doubling Metric Decomposition thanks: This work was partially supported by the Polish Ministry of Science grant N206 355636. E-mail addresses: {cygan,kowalik,mucha,malcin,sank}@mimuw.edu.pl.

Marek Cygan Institute of Informatics, University of Warsaw, Poland    Lukasz Kowalik Institute of Informatics, University of Warsaw, Poland    Marcin Mucha Institute of Informatics, University of Warsaw, Poland   
Marcin Pilipczuk and Piotr Sankowski
Institute of Informatics, University of Warsaw, Poland Institute of Informatics, University of Warsaw, Poland Dipartimento di Informatica e Sistemistica, Sapienza - University of Rome, Italy
Abstract

In this paper we propose and study a new complexity model for approximation algorithms. The main motivation are practical problems over large data sets that need to be solved many times for different scenarios, e.g., many multicast trees that need to be constructed for different groups of users. In our model we allow a preprocessing phase, when some information of the input graph is stored in a limited size data structure. Next, the data structure enables processing queries of the form “solve problem A for an input ”. We consider problems like Steiner Forest, Facility Location, -Median, -Center and TSP in the case when the graph induces a doubling metric. Our main results are data structures of near-linear size that are able to answer queries in time close to linear in . This improves over typical worst case reuniting time of approximation algorithms in the classical setting which is independently of the query size. In most cases, our approximation guarantees are arbitrarily close to those in the classical setting. Additionally, we present the first fully dynamic algorithm for the Steiner tree problem.

1 Introduction

Motivation

The complexity and size of the existing communication networks has grown extremely in the recent times. It is now hard to imagine that a group of users willing to communicate sets up a minimum cost communication network or a multicast tree according to an approximate solution to Steiner Tree problem. Instead we are forced to use heuristics that are computationally more efficient but may deliver suboptimal results [27, 20]. It is easy to imagine other problems that in principle can be solved with constant approximation factors using state of art algorithms, but due to immense size of the data it is impossible in timely manner. However, in many applications the network is fixed and we need to solve the problem many times for different groups of users.

Here, we propose a completely new approach that exploits this fact to overcome the obstacles stemming from huge data sizes. It is able to efficiently deliver results that have good approximation guarantee thanks to the following two assumptions. We assume that the network can be preprocessed beforehand and that the group of users that communicates is substantially smaller than the size of the network. The preprocessing step is independent of the group of users and hence afterwards we can, for example, efficiently compute a Steiner tree for any set of users.

More formally, in the Steiner Tree problem the algorithm is given a weighted graph on vertices and is allowed some preprocessing. The results of the preprocessing step need to be stored in limited memory. Afterwards, the set of terminals is defined and the algorithm should generate as fast as possible a Steiner tree for , i.e., a tree in of low weight which contains all vertices in . Given the query set of vertices we should compute the Steiner tree in time depending only (or, mostly) on .

The trivial approach to this problem is to compute the metric closure of and then answer each query by solving the Steiner Tree problem on . This approach delivers results with constant approximation ratio, but requires space of the data structure and query time. Hence it is far from being practical. In this work we aim at solutions that substantially improve both of these bounds; more formally the data structure space should be close to , while the query time should be close to . Since in a typical situation probably , so even a query time is not considered fast enough, as then . Note that the bound on the structure size is very restrictive: in a way, this bound is sublinear in the sense that we are allowed neither to store the whole distance matrix, nor (if is dense) all the edges of . This models a situation when during the preprocessing one can use vast resources (e.g., a huge cluster of servers), but the resources are not granted forever and when the system processes the queries the available space is much smaller.

New Model

In our model, computations are divided into two stages: the preprocessing stage and the query stage. In the preprocessing stage, the input is a weighted graph and we should compute our data structure in polynomial time and space. Apart from the graph some additional, problem-specific information may be also provided. In the query stage the algorithm is given the data structure computed in the preprocessing stage, but not itself, and a set of points of (the query — possibly a set of pairs of points from , or a weighted set of points from , etc.) and computes a solution for the set . The definition of “the solution for the set ” depends on the specific problem. In this work we consider so-called metric problems, so corresponds to a metric space where can be represented as the full distance matrix . One should keep in mind that the function cannot be quickly computed (e.g. in constant time) without the size matrix . In particular, we assume that there is no distance oracle available in the query stage.

Hence, there are three key parameters of an algorithm within our model: the size of the data structure, the query time and the approximation ratio. Less important, but not irrelevant is the preprocessing time. Let us note that though our model is inspired by large datasets, in this work we ignore streaming effects, external memory issues etc.

Above we have formulated the Steiner Tree problem in our model, now we describe the remaining problems. In Steiner Forest problem the algorithm is allowed to preprocess a weighted graph , whereas the query is composed of the set of pairs. The algorithm should generate the Steiner forest for , i.e., a subgraph of of small weight such that each pair in is connected in . In Facility Location problem the algorithm is given in the preprocessing phase a weighted graph with facility opening costs in the nodes. We consider two variants of this problem in our model. In the variant with unrestricted facilities, the query is a set of clients for which we should open facilities. The goal is to open a subset of facilities, and connect each city to an open facility so that the sum of the total opening and connection costs is minimized. In the other variant, one with restricted facilities, the facilities that can be opened are given as a part of query (together with their opening costs).

Our Results

In this paper we restrict our attention to doubling metric spaces which include growth-restricted metric spaces and constant dimensional Euclidean spaces. In other words we assume that the graph induces a doubling metric and the algorithms are given the distance matrix as an input or compute it at the beginning of the preprocessing phase. This restriction is often assumed in the routing setting [12, 7] and hence it is a natural question to see how it can impact the multicast problems. Using this assumption we show that solutions with nearly optimal bounds are possible. The main result of the paper is the data structure that requires memory and can find a constant ratio approximate Steiner tree over a given set of size in time. Moreover, we show data structures with essentially the same complexities for solving Steiner Forest, both versions of Facility Location, -Median and TSP. The query bound is optimal, up to and factors, as no algorithm can answer queries in time less than linear in as it needs to read the input. For the exact approximation ratios of our algorithms refer to Sections 3.2 and E.

All of these results are based on a new hierarchical data structure for representing a doubling metric that approximates original distances with -multiplicative factor. The concept of a hierarchical data structure for representing a doubling metric is not novel – it originates from the work of Clarkson [8] and was then used in a number of papers, in particular our data structure is based on the one due to Jia et al. [16]. Our main technical contribution here is adapting and extending this data structure so that for any subset a substructure corresponding to can be retrieved in using only the information in the data structure, without a distance oracle. The substructure is then transformed to a pseudo-spanner described above. Note that our complexity bounds do not depend on the stretch of the metrics, unlike in many previous works (e.g. [17]). Another original concept in our work is an application of spanners (or, more precisely, pseudo-spanners) to improve working time of approximation algorithms for metric problems. As a result, the query times for the metric problems we consider are .

Astonishingly, our hierarchical data structure can be used to obtain dynamic algorithms for the Steiner tree problem. This problem attracted considerable attention [3, 5, 11, 4] in the recent years. However, due to the hardness of the problem none of these papers has given any improvement in the running time over the static algorithms. Here, we give first fully dynamic algorithm for the problem in the case of doubling metric. Our algorithm is given a static graph and then maintains information about the Steiner tree built on a given set of nodes. It supports insertion of vertices in time, and deletion in time, where .

Related Work

The problems considered in this paper are related to several algorithmic topics studied extensively in recent years. Many researchers tried to answer the question whether problems in huge networks can be solved more efficiently than by processing the whole input. Nevertheless, the model proposed in this paper has never been considered before. Moreover, we believe that within the proposed framework it is possible to achieve complexities that are close to being practical. We present such results only in the case of doubling metric, but hope that the further study will extend these results to a more general setting. Our results are related to the following concepts:

  • Universal Algorithms — this model does not allow any processing in the query time, we allow it and get much better approximation ratios,

  • Spanners and Approximate Distance Oracles — although a spanner of a subspace of a doubling metric can be constructed in -time, the construction algorithm requires a distance oracle (i.e. the full -size distance matrix).

  • Sublinear Approximation Algorithms — here we cannot preprocess the data, allowing it we can get much better approximation ratios,

  • Dynamic Spanning Trees — most existing results are only applicable to dynamic MST and not dynamic Steiner tree, and the ones concerning the latter work in different models than ours.

Due to space limitation of this extended abstract an extensive discussion of the related work is attached in Appendix A and will be included in the full version of the paper.

2 Space partition tree

In this section we extend the techniques developed by Jia et al. [16]. Several statements as well as the overall construction are similar to those given by Jia et al. However, our approach is tuned to better suit our needs, in particular to allow for a fast subtree extraction and a spanner construction – techniques introduced in Sections 2 and 3 that are crucial for efficient approximation algorithms.

Let be a finite doubling metric space with and a doubling constant , i.e., for every , every ball of radius can be covered with at most balls of radius . By we denote the stretch of the metric , that is, the largest distance in divided by the smallest distance. We use space partition schemes for doubling metrics to create a partition tree. In the next two subsections, we show that this tree can be stored in space, and that a subtree induced by any subset can be extracted efficiently.

Let us first briefly introduce the notion of a space partition tree, that is used in the remainder of this paper. Precise definitions and proofs (in particular a proof of existence of such a partition tree) can be found in Appendix B.

The basic idea is to construct a sequence of partitions of . We require that , and , and in general the diameters of the sets in are growing exponentially in . We also maintain the neighbourhood structure for each , i.e., we know which sets in are close to each other (this is explained in more detail later on). Notice that the partitions together with the neighbourhood structure are enough to approximate the distance between any two points — one only needs to find the smallest , such that the sets in containing and are close to each other (or are the same set).

There are two natural parameters in this sort of scheme. One of them is how fast the diameters of the sets grow, this is controlled by in our constructions. The faster the set diameters grow, the smaller the number of partitions is. The second parameter is how distant can the sets in a partition be to be still considered neighbours, this is controlled by a nonnegative integer in our constructions. The smaller this parameter is, the smaller the number of neighbours is. Manipulating these parameters allows us to decrease the space required to store the partitions, and consequently also the running time of our algorithms. However, this also comes at a price of lower quality approximation.

In what follows, each is a subpartition of for . That is, the elements of these partitions form a tree, denoted by , with being the set of leaves and being the root. We say that is a child of in if .

Let be smaller than the minimal distance between points in and let . We show (in Appendix B) that -s and satisfying the following properties can be constructed in polynomial time:

  1. Exponential growth: Every is contained in a ball of radius .

  2. Small neighbourhoods: For every , the union crosses at most sets from the partition — we say that knows these . We also extend this notation and say that if knows , then every knows .

  3. Small degrees: For every all children of know each other and, consequently, there are at most children of .

  4. Distance approximation: If are different points such that , and , and knows but does not know , then

    For any , the and constants can be adjusted so that the last condition becomes (see Remark 6).

Remark 1

We note that not all values of and make sense for our construction. We omit these additional constraints here.

2.1 The compressed tree and additional information at nodes

Let us now show how to efficiently compute and store the tree . Recall that the leaves of are one point sets and, while going up in the tree, these sets join into bigger sets.

Note that if is an inner node of and it has only one child then both nodes and represent the same set. Nodes and can differ only by their sets of acquaintances, i.e. the sets of nodes known to them. If these sets are equal, there is some sort of redundancy in . To reduce the space usage we store only a compressed version of the tree .

Let us introduce some useful notation. For a node of let denote the set corresponding to and let denote the level of , where leaves are at level zero. Let , be a pair of sets that know each other at level and do not know each other at level . Then the triple is called a meeting of and at level .

Definition 1 (Compressed tree)

The compressed version of , denoted , is obtained from by replacing all maximal paths such that all inner nodes have exactly one child by a single edge. For each node of we store (the lowest level of in ) and a list of all meetings of , sorted by level.

Obviously has at most nodes since it has exactly leaves and each inner node has at least two children but we also have to ensure that the total number of meetings is reasonable.

Note that the sets at nodes of are pairwise distinct. To simplify the presentation we will identify nodes and the corresponding sets. Consider a meeting . Let (resp. ) denote the parent of (resp. ) in . We say that is responsible for the meeting when (when , both and are responsible for the meeting ). Note that if is responsible for a meeting , then knows at level . From this and Property 2 of the partition tree we get the following.

Lemma 1

Each set in is responsible for at most meetings.

Corollary 1

There are meetings stored in the compressed tree , i.e.  takes space.

Lemma 2

One can augment the tree with additional information of size , so that for any pair of nodes of one can decide if and know each other, and if that is the case the level of the meeting is returned. The query takes time.

Proof

For each node in we store all the meetings it is responsible for, using a dictionary — the searches take time. To process the query it suffices to check if there is an appropriate meeting in or in .∎

In order to give a fast subtree extraction algorithm, we need to define the following operation . Let be two given nodes. Let denote the node in on the path from to the root at level , similarly define . The value of is the lowest level, such that and know each other. Such level always exists, because in the end all nodes merge into root and nodes know each other at one level before they are merged (see Property 3 of the partition tree). A technical proof of the following lemma is moved to Appendix C due to space limitations.

Lemma 3

The tree can be augmented so that the operation can be performed in time. The augmented tree can be stored in space and computed in polynomial time.

2.2 Fast subtree extraction

For any subset we are going to define an -subtree of , denoted . Intuitively, this is the subtree of induced by the leaves corresponding to . Additionally we store all the meetings in between the nodes corresponding to the nodes of .

More precisely, the set of nodes of is defined as and is a node of . A node of is an ancestor of a node of iff . This defines the edges of . Moreover, for two nodes , of such that both and intersect , if knows at level , we say that knows in at level . A triple , where is a minimal level such that knows at level , is called a meeting. The level of a node of is the lowest level of a node of such that . Together with each node of we store its level and a list of all its meetings . A node is responsible for a meeting when .

Remark 2

The subtree is not necessarily equal to any compressed tree for the metric space .

In this subsection we describe how to extract from efficiently. The extraction runs in two phases. In the first phase we find the nodes and edges of and in the second phase we find the meetings.

2.2.1 Finding the nodes and edges of

We construct the extracted tree in a bottom-up fashion. Note that we can not simply go up the tree from the leaves corresponding to because we could visit a lot of nodes of which are not the nodes of . The key observation is that if and are nodes of , such that and are nodes of and is the lowest common ancestor of and , then is a node of and it has level .

  1. Sort the leaves of corresponding to the elements of according to their inorder value in , i.e., from left to right.

  2. For all pairs of neighboring nodes in the sorted order, insert into a dictionary a key-value pair where the key is the pair and the value is the pair . The dictionary may contain multiple elements with the same key.

  3. Insert all nodes from to a second dictionary , where nodes are sorted according to their inorder value from the tree .

  4. while contains more than one element

    1. Let be the smallest key in .

    2. Extract from all key-value pairs with the key , denote those values as .

    3. Set .

    4. Create a new node , make the nodes erased from the children of . Store as the level of .

    5. Insert into . Set .

    6. If is not the smallest element in (according to the inorder value) let be the largest element in smaller than and add a key-value pair to where the key is equal to and the value is .

    7. If is not the largest element in let be the smallest element in larger than and add a key-value pair to where the key is given by the pair and the value is the pair .

Note that in the above procedure, for each node of we compute the corresponding node in , namely . Observe that is the lowest common ancestor of the leaves corresponding to elements of , and .

Lemma 4

The tree can be augmented so that the above procedure runs in time and when it ends the only key in is the root of the extracted tree

Proof

All dictionary operations can be easily implemented in time whereas the lowest common ancestor can be found in time after an -time preprocessing (see [2]). This preprocessing requires space and has to be performed when is constructed. Since we perform of such operations is the complexity of our algorithm.∎

2.2.2 Finding the meetings in

We generate meetings in a top-down fashion. We consider the nodes of in groups. Each group corresponds to a single level. Now assume we consider a group of nodes at some level . Let be the set of children of all nodes in . For each node , we are going to find all the meetings it is responsible for. Any such meeting ( is of one of two types:

  1. , possibly , or

  2. , i.e. .

Figure 1: Extracting meetings. The figure contains a part of tree . Nodes corresponding to the nodes of are surrounded by dashed circles. The currently processed group of nodes (, ) are filled with black. Nodes from the set are filled with gray. The nodes below the gray nodes are the the nodes , i.e. the children of nodes in .

The meetings of the first kind are generated as follows. Consider the following set of nodes of (drawn as grey disks in Figure 1).

We mark all the nodes of . Next, we identify all pairs of nodes of that know each other. By Lemma 1 there are at most such pairs and these pairs can be easily found by scanning, for each , all the meetings is responsible for and such that the node meets is in . In this way we identify all pairs of children such that knows , namely if and knows in , then knows in . Then, if knows , the level of their meeting can be found in time using operation from Lemma 3. Hence, finding the meetings of the first type takes time for one group of nodes, and time in total.

Finding the meetings of the second type is easier. Consider any second type meeting . Let be the parent of . Then there is a meeting stored in . Hence it suffices to consider, for each all its meetings at level . For every such meeting , and for every child of we can apply from Lemma 3 to find the meeting of and . For the time complexity, note that by Property 2 of the partition tree, a node meets nodes at level . Since we can store the lists of meetings sorted by levels, we can extract all those meetings in time. For each meeting we iterate over the children of (Property 3 of the partition tree) and apply Lemma 3. This results in time per a child, hence time in total.

After extracting all the meetings, we sort them by levels in time.

We can claim now the following theorem.

Theorem 2.1

For a given set () we can extract the -subtree of the compressed tree in time .

3 Pseudospanner construction and applications in approximation

In this section we use the subtree extraction procedure described in the previous section, to construct for any set , a graph that is essentially a small constant stretch spanner for . We then use it to give fast approximations algorithms for several problems.

3.1 Pseudospanner construction

Definition 2

Let be an undirected connected graph with a weight function . A graph , with a weight function is an -pseudospanner for if for every pair of vertices we have , where and are shortest path metrics induced by and . The number in this definition is called the stretch of the pseudospanner. A pseudospanner for a metric space is simply a pseudospanner for the complete weighted graph induced by the metric space.

Remark 3

Note the subtle difference between the above definition and the classical spanner definition. A pseudospanner is a subgraph of in terms of vertex sets and edge sets but it does not inherit the weight function . We cannot construct spanners in the usual sense without maintaining the entire distance matrix, which would require prohibitive quadratic space. However, pseudospanners constructed below become classical spanners when provided the original weight function.

Also note, that it immediately follows from the definition of a pseudospanner that for all we have .

In the remainder of this section we let be a metric space of size , where is doubling with doubling constant . We also use to denote the hierarchical tree data structure corresponding to , and and denote the parameters of . For any , we use to denote the subtree of corresponding to , as described in the previous section. Finally, we define a constant .

Theorem 3.1

Given and set , where , one can construct a -pseudospanner for in time . This spanner has size .

The proof is in the appendix.

Remark 4

Similarly to Property 4 of the partition tree, we can argue that the above theorem gives a -pseudospanner for any . Here, we need to take and .

Remark 5

It is of course possible to store the whole distance matrix of and construct a spanner for any given subspace using standard algorithms. However, this approach has a prohibitive space complexity.

3.2 Applications in Approximation

Results of the previous subsection immediately give several interesting approximation algorithms. In all the corollaries below we assume the tree is already constructed.

Corollary 2 (Steiner Forest)

Given a set of points , , together with a set of requirements consisting of pairs of elements of , a Steiner forest with total edge-length at most OPT=OPT, for any can be constructed in time .

Proof

We use the algorithm of Cole et al. [9] (where is the number of edges) on the pseudospanner guaranteed by Theorem 3.1. This algorithm can give a guarantee for an arbitrarily small .∎

Similarly by using the MST approximation for TSP we get

Corollary 3 (Tsp)

Given a set of points , , a Hamiltonian cycle for of total length at most OPT=OPT for any can be constructed in time .

Currently, the best approximation algorithm for the facility location problem is the -approximation of Mahdian, Ye and Zhang [18]. A fast implementation using Thorup’s ideas [22] runs in deterministic time, where , and if the input is given as a weighted graph of vertices and edges, in time, with high probability (i.e. with probability ). In an earlier work, Thorup [23] considers also the -center and -median problems in the graph model. When the input is given as a weighted graph of vertices and edges, his algorithms run in time, w.h.p. and have approximation guarantees of for the -center problem and for the -median problem. By using this latter algorithm with our fast spanner extraction we get the following corollary.

Corollary 4 (Facility Location with restricted facilities)

Given two sets of points (cities) and (facilities) together with opening cost for each facility , for any , a -approximate solution to the facility location problem can be constructed in time , w.h.p.

The application of our results to the variant of Facility Location with unrestricted facilities is not so immediate. We were able to obtain the following.

Theorem 3.2 (Facility Location with unrestricted facilities)

Assume that for each point of -point there is assigned an opening cost . Given a set of points , for any , a -approximate solution to the facility location problem with cities’ set and facilities’ set can be constructed in time , w.h.p.

The above result is described in Appendix E. Our approach there is a reduction to the variant with restricted facilities. The general, rough idea is the following: during the preprocessing phase, for every point we compute a small set of facilities that seem a good choice for , and when processing a query for a set of cities , we just apply Corollary 4 to cities’ set and facilities’ set .

Corollary 5 (-center and -median)

Given a set of points and a number , for any , one can construct:

  1. a -approximate solution to the -center problem, or

  2. a -approximate solution to the -median problem

in time , w.h.p.

4 Dynamic Minimum Spanning Tree and Steiner Tree

In this section we give one last application of our hierarchical data structure. It has a different flavour from the other applications presented in this paper since it is not based on constructing a spanner, but uses the data structure directly. We solve the Dynamic Minimum Spanning Tree / Steiner Tree (DMST/DST) problem, where we need to maintain a spanning/Steiner tree of a subspace throughout a sequence of vertex additions and removals to/from .

The quality of our algorithm is measured by the total cost of the tree produced relative to the optimum tree, and time required to add/delete vertices. Let , . Our goal is to give an algorithm that maintains a constant factor approximation of the optimum tree, while updates are polylogarithmic in , and do not depend (or depend only slightly) on . It is clear that it is enough to find such an algorithm for DMST. Due to space limitations, in this section we only formulate the results. Precise proofs are gathered in Appendix F.

Theorem 4.1

Given the compressed tree , we can maintain an -approximate Minimum Spanning Tree for a subset subject to insertions and deletions of vertices. The insert operation works in time and the delete operation works in time, . Both times are expected and amortized.

References

  • [1] S. Baswana and S. Sen. A simple linear time algorithm for computing sparse spanners in weighted graphs. In Proc. ICALP’03, pages 384–396, 2003.
  • [2] M.A. Bender and M. Farach-Colton. The LCA problem revisited. In LATIN ’00: Proc. 4th Latin American Symposium on Theoretical Informatics, LNCS 1776, pages 88–94, 2000.
  • [3] D. Bilò, H.-J. Böckenhauer, J. Hromkovič, R. Královič, T. Mömke, P. Widmayer, and A. Zych. Reoptimization of steiner trees. In Proc. SWAT ’08, pages 258–269, 2008.
  • [4] H.-J. Böckenhauer, J. Hromkovič, R. Královič, T. Mömke, and P. Rossmanith. Reoptimization of steiner trees: Changing the terminal set. Theor. Comput. Sci., 410(36):3428–3435, 2009.
  • [5] H.-J. Böckenhauer, J. Hromkovič, T. Mömke, and P. Widmayer. On the hardness of reoptimization. In Proc. SOFSEM’08, volume 4910 of LNCS, pages 50–65. Springer, 2008.
  • [6] M. Bădoiu, A. Czumaj, P. Indyk, and C. Sohler. Facility location in sublinear time. In Proc. ICALP’05, pages 866–877, 2005.
  • [7] H.T-H. Chan, A. Gupta, B.M. Maggs, and S. Zhou. On hierarchical routing in doubling metrics. In Proc. SODA’05, pages 762–771, 2005.
  • [8] Kenneth L. Clarkson. Nearest neighbor queries in metric spaces. Discrete & Computational Geometry, 22(1):63–93, 1999.
  • [9] R. Cole, R. Hariharan, M. Lewenstein, and E. Porat. A faster implementation of the Goemans-Williamson clustering algorithm. In Proc. SODA’01, pages 17–25, 2001.
  • [10] C. Demetrescu and G.F. Italiano. A new approach to dynamic all pairs shortest paths. J. ACM, 51(6):968–992, 2004.
  • [11] B. Escoffier, M. Milanic, and V. Th. Paschos. Simple and fast reoptimizations for the Steiner tree problem. Algorithmic Operations Research, 4(2):86–94, 2009.
  • [12] S. Har-Peled and M. Mendel. Fast construction of nets in low dimensional metrics, and their applications. In Proc. SCG’05, pages 150–158, 2005.
  • [13] J. Holm, K. de Lichtenberg, and M. Thorup. Poly-logarithmic deterministic fully-dynamic algorithms for connectivity, minimum spanning tree, 2-edge, and biconnectivity. J. ACM, 48(4):723–760, 2001.
  • [14] M. Imase and B.M. Waxman. Dynamic steiner tree problem. SIAM Journal on Discrete Mathematics, 4(3):369–384, 1991.
  • [15] P. Indyk. Sublinear time algorithms for metric space problems. In Proc. STOC ’99, pages 428–434, New York, NY, USA, 1999. ACM.
  • [16] L. Jia, G. Lin, G. Noubir, R. Rajaraman, and R. Sundaram. Universal aproximations for TSP, Steiner Tree and Set Cover. In STOC’05, pages 1234–5415, 2005.
  • [17] R. Krauthgamer and J.R. Lee. Navigating nets: simple algorithms for proximity search. In Proc. SODA’04, pages 798–807, 2004.
  • [18] M. Mahdian, Y. Ye, and J. Zhang. Approximation algorithms for metric facility location problems. SIAM Journal on Computing, 36(2):411–432, 2006.
  • [19] L. Roditty. Fully dynamic geometric spanners. In Proc. SCG ’07, pages 373–380, 2007.
  • [20] H.F. Salama, D.S. Reeves, Y. Viniotis, and T-L. Sheu. Evaluation of multicast routing algorithms for real-time communication on high-speed networks. In Proceedings of the IFIP Sixth International Conference on High Performance Networking VI, pages 27–42, 1995.
  • [21] D.D. Sleator and R.E. Tarjan. A data structure for dynamic trees. In Proc. STOC’81, pages 114–122, 1981.
  • [22] M. Thorup. Quick and good facility location. In Proc. SODA’03, pages 178–185, 2003.
  • [23] M. Thorup. Quick k-median, k-center, and facility location for sparse graphs. SIAM Journal on Computing, 34(2):405–432, 2005.
  • [24] M. Thorup and U. Zwick. Approximate distance oracles. J. ACM, 52(1):1–24, 2005.
  • [25] D.E. Willard. New trie data structures which support very fast search operations. J. Comput. Syst. Sci., 28(3):379–394, 1984.
  • [26] D.E. Willard. Log-logarithmic selection resolution protocols in a multiple access channel. SIAM J. Comput., 15(2):468–477, 1986.
  • [27] P. Winter. Steiner problem in networks: A survey. Networks, 17(2):129–167, 1987.

Appendix A Related Work

In the next few paragraphs we review different approaches to this problem, state the differences and try to point out the advantage of the results presented here.

Universal Algorithms

In the case of Steiner Tree and TSP results pointing in the direction studied here have been already obtained. In the so called, universal approximation algorithms introduced by Jia et. al [16], for each element of the request we need to fix an universal solution in advance. More precisely, in the case of Steiner Tree problem for each we fix a path , and a solution to is given as . Using universal algorithms we need very small space to remember the precomputed solution and we are usually able to answer queries efficiently, but the corresponding approximation ratios are relatively weak, i.e, for Steiner Tree the approximation ratio is . Moreover, there is no direct way of answering queries in time, and in order to achieve this bound one needs to use similar techniques as we use in Section 2.2. In our model we loosen the assumption that the solution itself has to be precomputed beforehand, but the data output of the preprocessing is of roughly the same size (up to polylogarithmic factors). Also, we allow the algorithm slightly more time for answering the queries and, as a result are able to improve the approximation ratio substantially — from polylogarithmic to a constant.

Spanners and Distance Oracles

The question whether the graph can be approximately represented using less space than its size was previously captured by the notion of spanners and approximate distance oracles. Both of these data structures represent the distances in the graphs up to a given multiplicative factor . The difference is that the spanner needs to be a subgraph of the input graph hence distances between vertices are to be computed by ourselves, whereas the distance oracle can be an arbitrary data structure that can compute the distances when needed. However, both are limited in size. For general graphs -spanners (i.e., the approximation factor is ) are of size and can be constructed in randomized linear time as shown by Baswana and Sen [1]. On the other hand, Thorup and Zwick [24] have shown that the -approximate oracles of size , can be constructed in time, and are able to answer distance queries in time. It seems that there is no direct way to obtain, based on these results, an algorithm that could answer our type of queries faster then .

The construction of spanners can be improved in the case of doubling metric. The papers [12, 7] give a construction of -spanners that have linear size in the case when and the doubling dimension of the metric are constant. Moreover, Har-Peled and Mendel [12] give time construction of such spanners. A hierarchical structure similar to that of [17] and the one we use in this paper was also used by Roditty [19] to maintain a dynamic spanner of a doubling metric, with a update time. However, all these approaches assume the existence of a distance oracle. When storing the whole distance matrix, these results, combined with known approximation algorithms in the classical setting [18, 22, 23, 9], imply a data-structure that can answer Steiner Tree, Facility Location with restricted facilities and -Median queries in time. However, it does not seem to be easy to use this approach to solve the variant of Facility Location with unrestricted facilities. To sum up, spanners seem to be a good solution in our model in the case when a space is available for the data structure. The key advantage of our solution is the low space requirement. On the other hand, storing the spanner requires nearly linear space, but then we need time to answer each query. The distance matrix is unavailable and we will need to process the whole spanner to respond to a query on a given set of vertices.

Sublinear Approximation Algorithms

Another way of looking at the problem is the attempt to devise sublinear algorithm that would be able to solve approximation problems for a given metric. This study was started by Indyk [15] who gave constant approximation ratio -time algorithms for: Furthest Pair, -Median (for constant ), Minimum Routing Cost Spanning Tree, Multiple Sequence Alignment, Maximum Traveling Salesman Problem, Maximum Spanning Tree and Average Distance. Later on Bădoiu et. al [6] gave an time algorithm for computing the cost of the uniform-cost metric Facility Location problem. These algorithms work much faster that the -size metric description. However, the paper contains many negative conclusions as well. The authors show that for the following problems -time constant approximation algorithms do not exists: general metric Facility Location, Minimum-Cost Matching and -Median for . In contrary, our results show that if we allow the algorithm to preprocess partial, usually fixed, data we can answer queries in sublinear time afterwards.

Dynamic Spanning Trees

The study of online and dynamic Steiner tree was started in the paper of [14]. However, the model considered there was not taking the computation time into account, but only minimized the number of edges changed in the Steiner tree. More recently the Steiner tree problem was studied in a setting more related to ours [3, 5, 4, 11]. The first three of these paper study the approximation ratio possible to achieve when the algorithm is given an optimal solution together with the change of the data. The efficiency issue is only raised in [11], but the presented algorithm in the worst case can take the same as computing the solution from scratch. The problem most related to our results is the dynamic minimum spanning tree (MST) problem. The study of this problem was finished by showing deterministic algorithm supporting edge updates in polylogarithmic time in [13]. The dynamic Steiner tree problem is a direct generalization of the dynamic MST problem, and we were able to show similar time bounds. However, there are important differences between the two problems that one needs to keep in mind. In the case of MST, by definition, the set of terminals remains unchanged, whereas in the dynamic Steiner tree we can change it. On the other hand we cannot hope to get polylogarithmic update times if we allow to change the edge weights, because this would require to maintain dynamic distances in the graph. The dynamic distance problem seems to require polynomial time for updates [10].

Appendix B Partition tree — precise definitions and proofs

To start with, let us recall partition and partition scheme definitions.

Definition 3 (Jia et al [16], Definition 1)

A -partition is a partition of into disjoint subsets such that for all and for all , the ball intersects at most sets in the partition.

A partition scheme is an algorithm that produces -partition for arbitrary .

Lemma 5 (similar to Jia et al [16], Lemma 2)

Let be a nonnegative integer. For being a doubling metric space with doubling constant , there exists partition scheme that works in polynomial time. Moreover, for every the generated partition has the following property: for every there exists such that .

Proof

Take arbitrary . Start with . At step for take any and take . Set and proceed to next step. Obviously, , so and we set .

Take any and consider all sets crossed by ball . Every such set is contained in , which can be covered by at most balls of radius . But for every , , so every leader of set crossed by must be in a different ball. Therefore there are at most sets crossed.∎

Let us define the space partition tree .

Algorithm B.1

Assume we have doubling metric space and partition scheme form Lemma 5. Let us assume and let be a real constant satisfying:

  • , i.e, .

  • .

Then construct space partition tree as follows:

  1. Start with partition , and . For every let . Let .

  2. Let .

  3. While has more than one element do:

    1. Fix .

    2. Let be a partition of the set generated by given partition scheme for .

    3. Let .

    4. Set for any .

    5. .

Note that for every , is a partition of . We will denote by the leader of set that .

Definition 4

We will say that is a parent of if (equally ). This allows us to consider sets generated by Algorithm B.1 as nodes of a tree with root being the set .

Lemma 6

For every and for every the following holds:

Proof

Note that

We use bound from Lemma 5:

Lemma 7

For every , for every , the union of balls crosses at most sets from the partition .

Proof

For this is obvious, since is smaller than any for . Let us assume .

Let , , and . Then, using Lemma 6,

Since, by partition properties, crosses at most sets from and , this finishes the proof.∎

Definition 5

We say that a set knows a set if . We say that knows if and knows or .

Note that Lemma 7 implies the following:

Corollary 6

A set (and therefore a node too) at a fixed level has at most acquaintances.

Lemma 8

Let be a child of and let know . Then either or knows the parent of .

Proof

Assume that is not a child (subset) of and let be the parent of . Since knows , there exist , satisfying . But and and . ∎

Lemma 9

Set has at most children in the tree .

Proof

By construction of level , let be such a set that (in construction step we divided sets of leaders into partition ). Let be another child of . Then, by construction and assumption that :

However, by Lemma 7, crosses at most sets at level . That finishes the proof.∎

Lemma 10

Let be different points such that , and , and knows but does not know . Then