Linear-Time Compression of Bounded-Genus Graphs into Information-Theoretically Optimal Number of Bits††thanks: Accepted to SIAM Journal on Computing. A preliminary version appeared in SODA .
A compression scheme for a class of graphs consists of an encoding algorithm that computes a binary string for any given graph in and a decoding algorithm that recovers from . A compression scheme for is optimal if both and run in linear time and the number of bits of for any -node graph in is information-theoretically optimal to within lower-order terms. Trees and plane triangulations were the only known nontrivial graph classes that admit optimal compression schemes. Based upon Goodrich’s separator decomposition for planar graphs and Djidjev and Venkatesan’s planarizers for bounded-genus graphs, we give an optimal compression scheme for any hereditary (i.e., closed under taking subgraphs) class under the premise that any -node graph of to be encoded comes with a genus- embedding. By Mohar’s linear-time algorithm that embeds a bounded-genus graph on a genus- surface, our result implies that any hereditary class of genus- graphs admits an optimal compression scheme. For instance, our result yields the first-known optimal compression schemes for planar graphs, plane graphs, graphs embedded on genus- surfaces, graphs with genus or less, -colorable directed plane graphs, -outerplanar graphs, and forests with degree at most . For non-hereditary graph classes, we also give a methodology for obtaining optimal compression schemes. From this methodology, we give the first known optimal compression schemes for triangulations of genus- surfaces and floorplans.
Compact representation of graphs are fundamentally important and useful in many applications, including representing the meshes in finite-element analysis, terrain models of GIS, and 3D models of graphics [80, 82, 81, 92, 85, 64, 89, 48], VLSI design [84, 56], designing compact routing tables of computer networks [94, 37, 66, 77, 35, 95, 1, 16, 3, 36], and compressing the link structure of the Internet [15, 2, 88, 7, 5, 21]. Let be a class of graphs. Let denote the number of distinct -node graphs in . The information-theoretically optimal number of bits to encode an -node graph in is .111All logarithms throughout the paper are to the base of two. For instance, if is the class of rooted trees, then and ; if is the class of plane triangulations, then . A compression scheme for consists of an encoding algorithm that computes a binary string for any given graph in and a decoding algorithm that recovers graph from . A compression scheme for a graph class with is optimal if the following three conditions hold.
The running time of algorithm is linear in the size of .
The running time of algorithm is linear in the bit count of .
For all positive constants with , the bit count of for an -node graph in is no more than .
Condition C3 basically says that the bit count of is information-theoretically optimal to within lower-order terms. Although there has been considerable work on compression schemes, trees (see e.g., [72, 50, 67, 11]) and plane triangulations  were the only known nontrivial graph classes that admit optimal compression schemes. A graph class is hereditary if it is closed under taking subgraphs. Below is the main result of the paper.
Any hereditary class of graphs with admits an optimal compression scheme, as long as each input -node graph in to be encoded comes with a genus- embedding.
By Theorem 1.1 and Mohar’s linear-time genus- embedding algorithm for genus- graphs [70, 54] (see Lemma 2.5), any hereditary class of genus- graphs admits an optimal compression scheme. For instance, our result yields the first-known optimal compression schemes for planar graphs, plane graphs, graphs embedded on genus- surfaces, graphs with genus or less, -colorable directed plane graphs, -outerplanar graphs, and forests with degree at most . For non-hereditary graph classes, we also give an extension (see Corollary 5.1) of Theorem 1.1. As summarized in the following theorem, we show two classes of genus- graphs whose optimal compression schemes are obtainable via this extension, where the class of floorplans is defined in related work below.
The following classes of graphs admit optimal compression schemes:
Triangulations of a genus- surface for any integral constant .
The kernel of the proof of Theorem 1.1 is a linear-time disjoint partition of an -node graph embedded on a genus- surface.222Precisely, the disjoint partition of the edges of the embedded graph in the proof of Theorem 1.1 is , where is both (i) a -separation of an arbitrary triangulation of and (ii) a refinement of the -separation of . Let denote . Based upon Goodrich’s separator decomposition of planar graphs  and Djidjev and Venkatesan’s planarizer , partition satisfies the following conditions, where is the number of nodes of and is the number of times that the nodes of are duplicated in some with :333As a matter of fact, in our construction, all duplicated nodes of with belong to . (a) , (b) holds for each , (c) , and (d) . By Condition (a), can be encoded in bits. By Conditions (b) and (c), the information required to recover from can be encoded into bits (see Lemma 4.1). By Condition (d), we have . Therefore, the disjoint partition reduces the problem of encoding an -node graph in to the problem of encoding a -node graph in . Applying such a reduction for one more level, it remains to encode a -node graph in into an information-theoretically optimal number of bits, which can be resolved by the standard technique (see, e.g., [47, 72, 78]) of precomputation tables (see Lemma 2.3).
The compression scheme of Turán  encodes an -node plane graph that may have self-loops into bits.444For brevity, we omit all lower-order terms of bit counts in our discussion of related work. Keeler and Westbrook  improved this bit count to . They also gave compression schemes for several families of plane graphs. In particular, they used bits for plane triangulation, and bits for connected plane graphs free of self-loops and degree-one nodes. For plane triangulations, He et al.  improved the bit count to . For triconnected plane graphs, He et al.  also improved the bit count to at most bits. This bit count was later reduced to at most by Chuang et al. . For any given -node graph embedded on a genus- surface, Deo and Litow  showed an an -bit encoding for . These compression schemes all take linear time for encoding and decoding, but Condition C3 does not hold for them. The compression schemes of He et al.  (respectively, Blelloch et al. ) for planar graphs, plane graphs, and plane triangulations (respectively, separable graphs) satisfies Condition C3, but their encoding algorithms require time on -node graphs.
Floorplanning is a fundamental issue in circuit layout [106, 43, 69, 62, 51, 108, 32, 8, 17, 58, 24, 57, 91, 68, 84, 4]. Motivated by VLSI physical design, various representations of floorplans were proposed [110, 109, 33]. Designing a floorplan to meet a certain criterion is NP-complete in general [87, 44, 100], so heuristic techniques such as simulated annealing [102, 101, 17] are practically useful. The length of the encoding affects the size of the search space. A floorplan, which is also known as rectangular drawing, is a division of a rectangle into rectangular faces using horizontal and vertical line segments. Two floorplans are equivalent if they have the same adjacency relations and relative positions among the nodes. For instance, Figure 1 shows three floorplans: Floorplans (a) and (b) are equivalent. Floorplans (b) and (c) are not equivalent. Let be the input -node floorplan. Under the conventional assumption that each node of , other than the four corner nodes, has exactly three neighbors (see, e.g., [45, 107]), one can verify that has faces and edges. Yamanaka and Nakano  showed how to encode into bits. Chuang  reduced the bit count to . Takahashi et al.  further reduced bit count to . All these compression schemes for floorplans satisfy Conditions C1 and C2, but not Condition C3. Takahashi et al.  also showed that the number of distinct -node floorplans is no more than . Therefore, our Theorem 1.2(2) encodes an -node floorplan into at most bits.
For applications that require query support, Jacobson  gave a -bit encoding for a connected and simple planar graph that supports traversal in time per node visited. Munro and Raman  improved this result and gave schemes to encode binary trees, rooted ordered trees, and planar graphs. For a general -node -edge planar graph , they used bits while supporting adjacency and degree queries in time. Chuang et al.  reduced this bit count to for any constant with the same query support. The bit count can be further reduced if only -time adjacency queries are supported, or if is simple, triconnected or triangulated . Chiang et al.  reduced the number of bits to . Yamanaka and Nakano  showed a -bit encoding for plane triangulations with query support. The succinct encodings of Blandford et al.  and Blelloch et al.  for separable graphs support queries. Yamanaka et al.  also gave a compression scheme for floorplans with query support. For labeled planar graphs, Itai and Rodeh  gave an encoding of bits. For unlabeled general graphs, Naor  gave an encoding of bits. For certain graph families, Kannan et al.  gave schemes that encode each node with bits and support -time testing of adjacency between two nodes. Galperin and Wigderson  and Papadimitriou and Yannakakis  investigated complexity issues arising from encoding a graph by a small circuit that computes its adjacency matrix. Related work on various versions of succinct graph representations can be found in [73, 6, 31, 42, 38, 76, 83, 30, 29, 28, 9, 53] and the references therein.
The rest of the paper is organized as follows. Section 2 gives the preliminaries. Section 3 shows our algorithm for computing graph separations. Section 4 gives our optimal compression scheme for hereditary graph classes. Section 5 shows a methodology for obtaining optimal compression schemes for non-hereditary graph classes and applies this methodology on triangulations of genus- graphs and floorplans. Section 6 concludes the paper with a couple of open questions.
2.1 Segmentation prefix
Let denote the number of bits of binary string . A binary string is a segmentation prefix of binary strings if (a) it takes time to compute from and (b) given the concatenation of , it takes time to recover all with .
Any binary strings have an -bit segmentation prefix, where .
Let be the concatenation of . If , let be the -bit binary string with exactly copies of -bits such that the -th bit of is if and only if holds for some . Otherwise, let store the -bit numbers for all . Let be the segmentation prefix of and as ensured by Lemma 2.1. The concatenation of and is a segmentation prefix of with bits. The lemma is proved. ∎
For the rest of the paper, let denote the concatenation of , where is the segmentation prefix of as ensured by Lemma 2.2.
2.2 Precomputation table
Unless clearly stated otherwise, all graphs throughout the paper are simple, i.e., having no multiple edges or self-loops. Let denote the cardinality of set . Let consist of the nodes in graph and let . For any subset of , let denote the subgraph of induced by and let denote the subgraph of obtained by deleting and their incident edges. Two disjoint subsets and of are adjacent in if there is an edge of with and . For any subset of , let consist of the nodes in that are adjacent to in and let . A connected component of graph is a maximal subset of such that is connected.
Let be a graph class satisfying . Given positive integers and with , it takes overall time to compute (i) a labeling and a -bit binary string for each distinct graph with at most nodes and (ii) an -bit string such that the following statements hold.
Given any graph with , it takes time to obtain and from .
Given for any graph with , it takes time to obtain and from .
Straightforward by . ∎
2.3 Separator decomposition of planar graphs
Sets form a disjoint partition of set if are pairwise disjoint and . A subset of is a separator of graph with respect to and if (1) , , and form a disjoint partition of , (2) and are not adjacent in , (3) , and (4) . A separator decomposition  of is a rooted binary tree on a disjoint partition of such that the following two statements hold, where “nodes” specify elements of and “vertices” specify elements of . Statement 1: Each leaf vertex of consists of a single node of . Statement 2: Each internal vertex of is a separator of with respect to and , where and are the child vertices of in and (respectively, and ) is the union of all the vertices in the subtree of rooted at (respectively, and ). See Figure 2 for an illustration.
Lemma 2.4 (Goodrich ).
It takes time to compute a separator decomposition for any given -node planar graph.
2.4 Planarizers for non-planar graphs
The genus of a graph is the smallest integer such that can be embedded on an orientable surface with handles without edge crossings . For example, the genus of a planar graph is zero. By Euler’s formula (see, e.g., ), an -node genus- graph has edges. Determining the genus of a general graph is NP-complete , but Mohar  showed that it takes linear time to determine whether a graph is of genus for any . Mohar’s algorithm is simplified by Kawarabayashi et al. .
It takes time to compute a genus- embedding for any given -node genus- graph.
Gilbert et al.  gave an -time algorithm to compute an -node separator of an -node genus- graph, generalizing Lipton and Tarjan’s classic separator theorem for planar graphs . Our result relies on the following planarization algorithm.
Lemma 2.6 (Djidjev and Venkatesan ).
Given an -node graph embedded on a genus- surface, it takes time to compute a subset of with such that is planar.
3 Separation and refinement
We say that with is a separation of graph if the following properties hold.
form a disjoint partition of .
Any two and with are not adjacent in .
For instance, Figure 3(a) shows a separation of graph and Figure 4(a) shows another separation of . For any subset of , let be the subgraph of induced by excluding the edges of . If is a separation of , then form a disjoint partition of the edges of . See Figures 3(b) and 4(b) for illustrations. Let . For any positive integer , let . For notational brevity, for any nonnegative integer , let
For a nonnegative integer , separation of an -node graph is a -separation of if the following three properties hold.
holds for each .
One can verify that is a -separation of .555The “” in Property S3 is redundant for . However, we need it so that is a -separation of , since .
Let and be two separations of graph . We say that is a refinement of if the following three properties hold.
For each index , there is an index with and .
For any indices , , with , if , then .
For instance, in Figure 4(a), is a refinement of . Below is the main lemma of the section.
Let be a positive integer. Let be an -node connected graph embedded on a genus- surface. Given a -separation of , it takes time to compute a -separation of that is a refinement of .
Let be a positive integer. Given an -node graph embedded on a genus- surface, it takes time to compute an -node subset of such that each node of has degree at most in and each connected component of has at most nodes.
We first apply Lemma 2.6 to compute in time an -node subset of such that is planar. We then apply Lemma 2.4 to compute in time a separator decomposition of . For each vertex of , let denote the union of all the vertices in the subtree of rooted at and let . Let . Let consist of the nodes of with degree more than in . Let be the union of all the vertices of with . Let . By and the definition of , each connected component of has at most nodes. By , each node of has degree at most in . Since has edges, . It remains to show . For each index , let consist of the vertices of with . By and , each is an internal vertex of . By definition of , we know that and are disjoint for any two distinct elements and of , implying that holds. Since holds for each , we have . Since each is an internal vertex of , is a separator of . Therefore, holds for each vertex in . We have . The lemma is proved. ∎
Proof of Lemma 3.1.
Suppose that is the given -separation . Let be the -time computable subset of ensured by Lemma 3.2. We have . Let . Let consist of the connected components of . By , each element of has at most nodes. By and Properties S1 and S2 of , each element of is contained by some with . For each , let consist of the elements of with . We run Algorithm 1 to obtain (a) a disjoint partition of and (b) nodes of , which may not be distinct. Let . Since is connected, each element of is adjacent to . The first statement of the outer repeat-loop is well defined. Since each element of has at most nodes, the first statement of the inner repeat-loop is well defined. See Figure 5 for an illustration: Suppose that all nodes are in . All nodes are initially unmarked. Let consist of the nine unlabeled nodes, including the three gray nodes. For each , let consist of the nodes with label . That is, are the six connected components of . Suppose that and the first two iterations of the outer repeat-loop obtain and . In the third iteration of the outer repeat-loop, are the unmarked elements of that are adjacent to in clockwise order around . By , the two iterations of the inner repeat-loop obtain and .
By definition of Algorithm 1, one can verify that Properties R1, R2, and R3 hold for and (that is, is a refinement of ) and Properties S1 and S2 hold for . By Property S3 of , we have . By , we have . Let consist of the indices with and . Let consist of the indices with and . We show as follows. By Property S1 of , we have . To show , we categorize the indices in with into the the following types, where is the index with :
- Type 1:
and . The number of such indices is at most .
- Type 2:
- Type 2a:
. The number of such indices is at most .
- Type 2b:
- Type 2c:
To see Property S5 of , we obtain a contracted graph from by performing the following two steps for each .666The contraction procedure is only for proving Property S5 of , not needed for computing . Step 1: Let be the elements of with in clockwise order around in . Split into two adjacent nodes and and let take over the neighbors of in clockwise order around from the first neighbor of in to the first neighbor of in . Step 2: Contract all nodes of into node and delete multiple edges and self-loops. See Figure 6 for an illustration: For each , let consist of the nodes with labels in Figure 6(a). Suppose that , , and . The unlabeled circle nodes belong to . The square nodes are two previously contracted nodes and from and for some indices and with . Figure 6(b) shows the result of Step 1. Figure 6(c) shows the result of Step 2. Observe that each node that is adjacent to becomes a neighbor of after applying Steps 1 and 2. Also, each neighbor of that is not in either remains a neighbor of or becomes a neighbor of after applying Steps 1 and 2. Therefore, for each and each node , there is either an edge or an edge for some index with and . Thus, is no more than the number of edges in the resulting contracted simple graph, which has nodes. Observe that Step 1 does not increase the genus of the embedding. Since the subgraph induced by is connected, Step 2 does not increase the genus of the embedding, either. The number of edges in the resulting contracted simple genus- graph is . Property S5 holds for . The lemma is proved. ∎
4 Our compression scheme
This section proves Theorem 1.1.
4.1 Recovery string
A labeling of graph is a one-to-one mapping from to . For instance, Figure 7(a) shows a labeling for graph . Let be a graph embedded on a surface. We say that a graph embedded on the same surface is a triangulation of if is a subgraph of with such that each face of has three nodes. The following lemma shows an -bit string with which the larger embedded labeled subgraphs of can be recovered from smaller embedded labeled subgraphs of in time.
Let be a positive integer. Let be an -node graph embedded on a genus- surface. Let be a triangulation of . Let be a given -separation of and be a given -separation of such that is a refinement of . For any given labeling of for each , the following statements hold.
It takes overall time to compute a labeling of subgraph for each .
Given the above labelings of subgraphs with , it takes time to compute an -bit string such that and for all can be recovered in overall time from and and for all .