Sublinear Distance Labeling
A distance labeling scheme labels the nodes of a graph with binary strings such that, given the labels of any two nodes, one can determine the distance in the graph between the two nodes by looking only at the labels. A -preserving distance labeling scheme only returns precise distances between pairs of nodes that are at distance at least from each other. In this paper we consider distance labeling schemes for the classical case of unweighted graphs with both directed and undirected edges.
We present a bit -preserving distance labeling scheme, improving the previous bound by Bollobás et. al. [SIAM J. Discrete Math. 2005]. We also give an almost matching lower bound of . With our -preserving distance labeling scheme as a building block, we additionally achieve the following results:
1. We present the first distance labeling scheme of size for sparse graphs (and hence bounded degree graphs). This addresses an open problem by Gavoille et. al. [J. Algo. 2004], hereby separating the complexity from distance labeling in general graphs which require bits, Moon [Proc. of Glasgow Math. Association 1965]. 111This result for sparse graphs was made available online in a preliminary version of this paper . The label size was subsequently slightly improved by an factor by Gawrychowski et al. .
2. For approximate -additive labeling schemes, that return distances within an additive error of we show a scheme of size for . This improves on the current best bound of by Alstrup et. al. [SODA 2016] for sub-polynomial , and is a generalization of a result by Gawrychowski et al. [arXiv preprint 2015] who showed this for .
The concept of informative labeling schemes dates back to Breuer and Folkman [13, 14] and was formally introduced by Kannan et al. [31, 35]. A labeling scheme is a way to represent a graph in a distributed setting by assigning a bit string (called a label) to each node of the graph. In a distance labeling scheme we assign labels to a graph from a family (i.e. all forests, bounded-degree graphs, or planar graphs with nodes) such that, given only the labels of a pair of nodes, we can compute the distance between them without the need for a centralized data structure. When designing a labeling scheme the main goal is to minimize the maximum label size over all nodes of all graphs in the family . We call this the size of the labeling scheme. As a secondary goal some papers consider the encoding and decoding time of the labeling scheme in various computational models. In this paper we study the classical case of unweighted graphs.
The problem of exact distance labeling in general graphs is a classic problem that was studied thoroughly in the 1970/80’s. Graham and Pollak  and Winkler  showed that labels of size suffice in this case. Combining  and  gives a lower bound of bits (see also ). Recently, Alstrup et al.  improved the label size to bits.
Distance labeling schemes have also been investigated for various families of graphs, providing both upper and lower bounds. For trees, Peleg  showed that labels of size suffice with a matching lower bound by Gavoille et. al . Gavoille et. al  also showed a lower bound for planar graphs and bound for bounded degree (and thus sparse) graphs and for weighted graphs this was recently improved by Abboud and Dahlgaard  to even for bounded planar graphs. Gavoille et al.  provided an labeling scheme for planar graphs (even weighted), however nothing better than the scheme for general graphs is known for bounded-degree graphs. It has remained a major open problem in the field of labeling schemes whether a scheme of size or even exists for bounded-degree graphs as stated in e.g. .
For some applications, the requirement on the label size for several graph classes is prohibitive. Therefore a large body of work is dedicated to labeling schemes for approximating distances in various families of graphs [3, 15, 20, 25, 28, 29, 33, 36, 37, 38, 39]. Such labeling schemes often provide efficient implementations of other data structures like distance oracles  and dynamic graph algorithms .
In  a labeling scheme of size was presented for approximating distances up to a factor222This does not break the Girth Conjecture, as the labeling scheme may under-estimate the distance as well. of . In  a scheme of poly-logarithmic size was given for planar graphs when distances need only be reported within a factor of . Labeling schemes of additive error have also been investigated. For general graphs Alstrup et. al  gave a scheme of size for -additive distance labeling with and a lower bound of was given by Gavoille et al. . For a lower bound of can be established by observing that such a scheme can answer adjacency queries in bipartite graphs, which require bits to label for adjacency.
An alternative to approximating all distances is to only report exact distances above some certain threshold . A labeling scheme, which reports exact distances for nodes where is called a -preserving distance labeling scheme333In this paper we adopt the convention that the labeling scheme returns an upper-bound if .. Bollobás et al.  introduced this notion and gave a labeling scheme of size for both directed and undirected graphs. They also provided an lower bound for directed graphs.
1.1 Related work
A problem closely related to distance labeling is adjacency labeling. For some classes such as general graphs the best-known lower bounds for distance is actually that of adjacency. Adjacency labeling has been studied for various classes of graphs. In  the label size for adjacency in general undirected graphs was improved from [31, 34] to optimal size , and in  adjacency labeling for trees was improved from  to optimal size .
Distance labeling schemes and related -hop labeling are used in SIGMOD and is central for some real-world applications [5, 18, 30]. Approximate distance labeling schemes have found applications in several fields such as reachability and distance oracles  and communication networks . An overview of distance labeling schemes can be found in .
1.2 Our results
We address open problems of [8, 12, 25] improving the label sizes for exact distances in sparse graphs, -additive distance in general graphs, and -preserving distance labeling. We do this by showing a strong relationship between -preserving distance labeling and several other labeling problems using -preserving distance labels as a black box. Thus, by improving the result of  we are able to obtain the first sublinear labeling schemes for several problems studied at SODA over the past decades. Our results hold for both directed and undirected graphs and are summarized below.
We present the first sublinear distance labeling scheme for sparse graphs giving the following theorem:
Let denote the family of unweighted graphs on nodes with at most edges. Then there exists a distance labeling scheme for with maximum label size .
As noted, prior to this work the best-known bound for this family was the scheme of  for general graphs. Thus, Theorem 1 separates the family of sparse graphs from the family of general graphs requiring label size. Our result uses a black-box reduction from sparse graphs to the -preserving distance scheme of Theorem 3 below. The result of Theorem 1 was made available online in a preliminary version of this paper  and was subsequently slightly improved by Gawrychowski et al.  by noting, that one of the steps in the construction of our -preserving distance scheme can be skipped when only considering sparse graphs444 The scheme presented in this paper has labels of length , where . In  they improve the exponent of the term from to ..
Approximate labeling schemes:
For -additive distance labeling Gawrychowski et al.  showed that a sublinear labeling scheme for sparse graphs implies a sublinear labeling scheme for in general graphs. We generalize this result to by a reduction to the -preserving scheme. We note that a reduction to sparse graphs does not suffice in this case, and the scheme of  thus only works for . More precisely, we show the following:
For any , there exists an approximate -additive labeling schemes for the family of unweighted graphs on nodes with maximum label size
-preserving labeling schemes:
For -preserving labeling schemes we show that:
For any integer , there exists a -preserving distance labeling scheme for the family of unweighted graphs on nodes with maximum label size
Theorem 3 improves the result of  by a factor of giving the first sublinear size labels for this problem for any . This sublinearity is the main ingredient in showing the results of Theorems 1 and 2. Our scheme uses sampling similar to that of . By sampling fewer nodes we show that not “too many” nodes end up being problematic and handle these separately555We note that after making this result available online in a preliminary version , the bound of Theorem 3 was slightly improved by Gawrychowski et al.  to ..
Finally, we show the following almost matching lower bound for undirected graphs extending the construction of  for directed graphs.
A -preserving distance labeling scheme for the family of unweighted and undirected graphs on nodes require label size , when is an integer in .
Throughout the paper we adopt the convention that and . When we define . In this paper we assume the word-RAM model, with word size . If is a bitstring we denote its length by and will also use to denote the integer value of when this is clear from context. We use to denote concatenation of bit strings. Finally, we use the Elias code  to encode a bitstring of unknown length using bits such that we may concatenate several such bitstrings and decode them again.
A distance labeling scheme for a family of graphs consists of an encoder and a decoder . Given a graph the encoder computes a label assignment , which assigns a label to each node of . The decoder is a function such that given any graph and any pair of nodes we have . Note that the decoder is oblivious to the actual graph and is only given the two labels and .
The size of a labeling scheme is defined as the maximum label size over all graphs and all nodes . If for all graphs the mapping is injective we say that the labeling scheme assigns unique labels (note that two different graphs may share a label).
If the encoder and graph is clear from the context, we will sometimes denote the label of a node by .
3 -preserving distance labeling schemes
In this section we will prove Theorem 3. Observe first that for Theorem 3 is exactly the classic problem of distance labeling and we may use the result of . We will therefore assume that for the remainder of this paper. Let us first formalize the definition of a -preserving distance labeling scheme.
Let be a positive integer let be a family of graphs. For each graph let be a mapping of nodes to labels. Let be a decoder. If and satisfy the following two properties, we say that the pair is a -distance preserving labeling scheme for the graph family .
for all for any .
for all with for any .
The idea of the labeling scheme presented in this section is to first make a labeling scheme for distances in the range and use this scheme for increasingly bigger distances until all distances of at least are covered. Loosely speaking, the scheme is obtained by sampling a set of nodes , such that most shortest paths of length at least contain a node from . Then all nodes are partitioned into sick and healthy nodes adding the sick nodes to the set . All nodes then store their distance to each node of and healthy nodes will store the distance to all nodes, for which the shortest path is not covered by some node in .
3.1 A sample-based approach
As a warm-up, we first present the scheme of Bollobás et al. in  with a slight modification.
Given a graph we pick a random multiset consisting of nodes for a constant to be decided. Each element of is picked uniformly and independently at random from (i.e. the same node might be picked several times)666In  they instead picked by including each node of with probability .. We order arbitrarily as and assign the label of a node as
Let and be two nodes of some graph . Set
Then and if contains a node from a shortest path between and .
Let be the node corresponding to the minimum value of (1). We then have . By the triangle inequality this implies .
Now let be some shortest path between and in and assume that . Then , implying that , and thus . ∎
By Lemma 1 it only remains to show that the set is likely to contain a node on a shortest path between any pair of nodes with .
Let be defined as above. Then the probability that there exists a pair of nodes such that and no node on the shortest path between and is sampled is at most .
Consider a pair of nodes with . Let be a shortest path between and , then . Each element of has probability at least of belonging to (independently), so the probability that no element of belonging to is at most
Since there are at most such pairs, by a union bound the probability that there exists a pair with , such that no element on a shortest path between and is sampled in is thus at most ∎
By setting we can ensure that the expected number of times we have to re-sample the set until the condition of Lemma 2 is satisfied is . The labels can be assigned using bits as each distance can be stored using bits.
3.2 A scheme for medium distances
We now present a scheme, which preserves distances in the range using bits. More formally, we present a labeling scheme such that given a family of unweighted graphs the encoder, , and the decoder, , satisfies the following constraints for any :
for any .
for any with .
Let such a labeling scheme be called a -preserving distance labeling scheme.
The labeling scheme is based on a sampling procedure similar to that presented in Section 3.1, but improves the label size by introducing the notion of sick and healthy nodes. Below we described only the labeling scheme for undirected graphs. We note that this can be turned into a labeling scheme for directed graphs by at most doubling the label size. In our undirected labeling scheme we store distances to several nodes in the graph, and for a directed scheme one simply needs to store both distances to and from these nodes. This will be evident from the description below.
Let . We sample a multiset of size . Similar to Section 3.1, each element of is picked uniformly at random from .
Let be as defined above and fix some node . We say that a node is uncovered for if and no node in is contained in a shortest path between and . A node with more than uncovered nodes is called sick and all other nodes are called healthy.
Let denote the set of sick nodes and let denote the set of uncovered nodes for . The main outline of the scheme is as follows:
Each node stores the distance from itself to each node of .
If is healthy, stores the distance from itself to every for which .
We start by showing that the set of sick nodes has size with probability at least . This is captured by the following lemma.
Let be defined as above and let be the set of sick nodes. Then
The goal is now to store the distances to the nodes of as well as using few bits. First consider the distances to the nodes of . Observe that since we only wish to recover distances in the interval we only need to store distances to the nodes of which are at most away. Let be any node in . We will store the distances from to the relevant nodes of as follows: We first fix a canonical ordering of the nodes in , which is the same for all nodes . For each node of in order we now store either a 0-bit if its distance to is greater than . Otherwise we store the distance using at most bits.
We may now assign the label of a node to be concatenated with the bitstring resulting from the above procedure for . If is healthy we concatenate an identifier for the set of uncovered nodes restricted to nodes within distance along with the distance to each of these nodes. The decoder works by simply checking if one nodes stores the others distance or by taking the minimum of going via any node in .
In order to bound the size of the label we first observe that has size at most and we can thus store the distance (or a 0-bit) to each of these nodes using bits. We thus only need to bound the size of storing id’s and distances to the nodes of whose distance is in . Since we only store this for healthy nodes this set has size at most and can be described using at most
bits. Since each distance can be stored using bits we conclude that the total label size is bounded by .
There exists a -preserving distance labeling scheme for the family of unweighted graphs on nodes with maximum label size
This is a direct corollary of the discussion above. ∎
3.3 Bootstrapping the scheme
In order to show Theorem 3 we will concatenate several instances of the label from Theorem 5. First define to be the -preserving distance label for the node assigned by the scheme of Theorem 5. Now assign the following label to each node :
where . Let be the distance returned by running the decoder of Theorem 5 on the corresponding component, , of the label . Then we let the decoder of the full labeling scheme return
with defined as above. We are now ready to prove Theorem 3.
3.4 Lower bound
Proof of Theorem 4.
Let and let and be sets of nodes which make up the left and right side of a bipartite graph respectively. Furthermore, let each node of be the first node on a path of nodes.
Consider now the family of all such bipartite graphs with the attached paths. There are exactly such graphs.
Now observe, that a node is adjacent to a node if and only if , where is the last node on the path starting in . By querying all such pairs we obtain bits of information using only labels, thus at least one label of size
is needed. Since the graph has nodes this implies the result.
This is illustrated in Figure 1.
4 Sparse and bounded degree graphs
We are now ready to prove Theorem 1. In fact we will show the following more general lemma:
Let denote the family of unweighted graphs on nodes with at most edges. Then there exists a distance labeling scheme for with maximum label size
Since when it will suffice to prove Lemma 4. In order to do so we first show the following lemma for bounded-degree graphs:
Let be the family of graphs on nodes with maximum degree . There exists a distance labeling scheme for with maximum label size
Suppose we are labeling some graph and let . Let and let be the -distance preserving label assigned by using Theorem 3 with parameter . Using this label we can deduce the distance to all nodes of distance at least to .
Since there are at most nodes closer than distance to . Thus, we may describe the IDs and distances of these nodes using at most bits. This gives the desired total label size of
Proof of Lemma 4.
Let be some graph and let . Let be some node with more than incident edges. If no such node exists, we may apply Lemma 5 directly and we are done. Otherwise we split into nodes and connect these nodes with a path of -weight edges. Denote these nodes . For each edge in we assign the end-point at to a node with . This process is illustrated in Figure 2.
Let the graph resulting from performing this process for every node be denoted by . We then have . Furthermore it holds that for every pair of nodes we have . Consider now using the labeling scheme of Lemma 5 on and setting for each node . We note that splitting nodes in the graph results in a weighted graph with weights and . However, one can observe that the labeling scheme of Theorem 3 actually preserves distances for nodes who have at least edges on a shortest path between them. It thus follows that this is actually a distance labeling scheme for . The number of nodes in is bounded by
which means that Lemma 5 gives the desired label size. ∎
5 Additive error
Let and let . We describe the scheme in three parts:
Let be a copy of , where an edge is added between any pair of nodes whose distance is at most in . Let be the set of nodes in with degree at least and let be a minimum dominating set of in . Then .
For all nodes we store and for all .
Consider now the subgraph of induced by . For a node , let be the ball of radius around in this induced subgraph Then . This follows from the definition of : There are at most nodes within distance from and thus at most nodes within distance from , etc.
For all we store and for all .
Finally we store a -preserving distance label for all .
The total label size is then
as stated in Theorem 2.
To see that the distance between two nodes and can be calculated within an additive error we split into several cases:
If we can report the exact distance between and using the -preserving distance scheme.
If and we can find a node such that and thus
and symmetrically if .
Finally, if and and , then we and we can thus report the exact distance between and .
We would like to thank Noy Rotbart for helpful discussions and observations.
-  A. Abboud and S. Dahlgaard. Popular conjectures as a barrier for dynamic planar graph algorithms. CoRR, abs/1605.03797, 2016. To appear at FOCS’16.
-  S. Abiteboul, H. Kaplan, and T. Milo. Compact labeling schemes for ancestor queries. In Proc. of the 12th Annual ACM-SIAM Symp. on Discrete Algorithms (SODA), pages 547–556, 2001.
-  I. Abraham, S. Chechik, and C. Gavoille. Fully dynamic approximate distance oracles for planar graphs via forbidden-set distance labels. In Proc. 44th Annual ACM Symp. on Theory of Computing (STOC), pages 1199–1218, 2012.
-  R. Agarwal, P. B. Godfrey, and S. Har-Peled. Approximate distance queries and compact routing in sparse graphs. In INFOCOM 2011. 30th IEEE International Conference on Computer Communications, pages 1754–1762, 2011.
-  T. Akiba, Y. Iwata, and Y. Yoshida. Fast exact shortest-path distance queries on large networks by pruned landmark labeling. In ACM International Conference on Management of Data (SIGMOD), pages 349–360, 2013.
-  S. Alstrup, S. Dahlgaard, and M. B. T. Knudsen. Optimal induced universal graphs and labeling schemes for trees. In Proc. 56th Annual Symp. on Foundations of Computer Science (FOCS), 2015.
-  S. Alstrup, S. Dahlgaard, M. B. T. Knudsen, and E. Porat. Sublinear distance labeling for sparse graphs. CoRR, abs/1507.02618, 2015.
-  S. Alstrup, C. Gavoille, E. B. Halvorsen, and H. Petersen. Simpler, faster and shorter labels for distances in graphs. In Proc. 27th Annual ACM-SIAM Symp. on Discrete Algorithms (SODA), pages 338–350, 2016.
-  S. Alstrup, H. Kaplan, M. Thorup, and U. Zwick. Adjacency labeling schemes and induced-universal graphs. In Proc. of the 47th Annual ACM Symp. on Theory of Computing (STOC), 2015.
-  S. Alstrup and T. Rauhe. Small induced-universal graphs and compact implicit graph representations. In Proc. 43rd Annual Symp. on Foundations of Computer Science (FOCS), pages 53–62, 2002.
-  F. Bazzaro and C. Gavoille. Localized and compact data-structure for comparability graphs. Discrete Mathematics, 309(11):3465–3484, 2009.
-  B. Bollobás, D. Coppersmith, and M. Elkin. Sparse distance preservers and additive spanners. SIAM J. Discrete Math., 19(4):1029–1055, 2005. See also SODA’03.
-  M. A. Breuer. Coding the vertexes of a graph. IEEE Trans. on Information Theory, IT–12:148–153, 1966.
-  M. A. Breuer and J. Folkman. An unexpected result on coding vertices of a graph. J. of Mathemathical analysis and applications, 20:583–600, 1967.
-  V. D. Chepoi, F. F. Dragan, B. Estellon, M. Habib, and Y. Vaxès. Diameters, centers, and approximating trees of delta-hyperbolic geodesic spaces and graphs. In Annual ACM Symp. on Computational Geometry (SoCG), pages 59–68, 2008.
-  V. D. Chepoi, F. F. Dragan, and Y. Vaxès. Distance and routing labeling schemes for non-positively curved plane graphs. J. of Algorithms, 61(2):60–88, 2006.
-  B. Courcelle and R. Vanicat. Query efficient implementation of graphs of bounded clique-width. Discrete Applied Mathematics, 131:129–150, 2003.
-  D. Delling, A. V. Goldberg, R. Savchenko, and R. F. Werneck. Hub labels: Theory and practice. In International Symp. on Experimental Algorithms (SEA), pages 259–270, 2014.
-  P. Elias. Universal codeword sets and representations of the integers. IEEE Transactions on Information Theory, 21(2):194–203, 1975.
-  M. Elkin, A. Filtser, and O. Neiman. Prioritized metric structures and embedding. In Proc. of the 47th Annual ACM Symp. on Theory of Computing (STOC), pages 489–498, 2015.
-  M. Elkin and S. Pettie. A linear-size logarithmic stretch path-reporting distance oracle for general graphs. In Proc. of the 26th Annual Symp. on Discrete Algorithms (SODA), pages 805–821, 2015.
-  C. Gavoille, M. Katz, N. A. Katz, C. Paul, and D. Peleg. Approximate distance labeling schemes. In Proc. of the 9th annual European Symp. on Algorithms (ESA), pages 476–488, 2001.
-  C. Gavoille and C. Paul. Distance labeling scheme and split decomposition. Discrete Mathematics, 273(1-3):115–130, 2003.
-  C. Gavoille and C. Paul. Optimal distance labeling for interval graphs and related graphs families. SIAM J. Discrete Math., 22(3):1239–1258, 2008.
-  C. Gavoille, D. Peleg, S. Pérennes, and R. Raz. Distance labeling in graphs. J. of Algorithms, 53(1):85 – 112, 2004. See also SODA’01.
-  P. Gawrychowski, A. Kosowski, and P. Uznanski. Even simpler distance labeling for (sparse) graphs. CoRR, abs/1507.06240, 2015.
-  R. L. Graham and H. O. Pollak. On embedding graphs in squashed cubes. In Lecture Notes in Mathematics, volume 303. Springer-Verlag, 1972.
-  A. Gupta, R. Krauthgamer, and J. R. Lee. Bounded geometries, fractals, and low-distortion embeddings. In 44th Annual Symp. on Foundations of Computer Science (FOCS), pages 534–543, 2003.
-  A. Gupta, A. Kumar, and R. Rastogi. Traveling with a pez dispenser (or, routing issues in mpls). SIAM J. on Computing, 34(2):453–474, 2005. See also FOCS’01.
-  R. Jin, N. Ruan, Y. Xiang, and V. Lee. A highway-centric labeling approach for answering distance queries on large sparse graphs. In ACM International Conference on Management of Data (SIGMOD), pages 445–456, May 2012.
-  S. Kannan, M. Naor, and S. Rudich. Implicit representation of graphs. SIAM J. Disc. Math., pages 596–603, 1992. See also STOC’88.
-  M. Katz, N. A. Katz, A. Korman, and D. Peleg. Labeling schemes for flow and connectivity. SIAM J. Comput., 34(1):23–40, 2004. See also SODA’02.
-  R. Krauthgamer and J. R. Lee. Algorithms on negatively curved spaces. In 47th Annual Symp. on Foundations of Computer Science (FOCS), pages 119–132, 2006.
-  J. W. Moon. On minimal -universal graphs. Proc. of the Glasgow Mathematical Association, 7(1):32–33, 1965.
-  J. H. Müller. Local structure in graph classes. PhD thesis, Georgia Institute of Technology, 1988.
-  D. Peleg. Proximity-preserving labeling schemes. J. Graph Theory, 33(3):167–176, 2000.
-  K. Talwar. Bypassing the embedding: algorithms for low dimensional metrics. In Proc. of the 36th Annual ACM Symp. on Theory of Computing (STOC), pages 281–290, 2004.
-  M. Thorup. Compact oracles for reachability and approximate distances in planar digraphs. J. ACM, 51(6):993–1024, 2004. See also FOCS’01.
-  M. Thorup and U. Zwick. Approximate distance oracles. J. of the ACM, 52(1):1–24, 2005. See also STOC’01.
-  P. M. Winkler. Proof of the squashed cube conjecture. Combinatorica, 3(1):135–139, 1983.