# A Constant Factor Approximation Algorithm for Boxicity of Circular Arc Graphs

## Abstract

Boxicity of a graph is the minimum integer such that can be represented as the intersection graph of -dimensional axis parallel rectangles in . Equivalently, it is the minimum number of interval graphs on the vertex set such that the intersection of their edge sets is . It is known that boxicity cannot be approximated even for graph classes like bipartite, co-bipartite and split graphs below -factor, for any in polynomial time unless . Till date, there is no well known graph class of unbounded boxicity for which even an -factor approximation algorithm for computing boxicity is known, for any . In this paper, we study the boxicity problem on Circular Arc graphs - intersection graphs of arcs of a circle. We give a -factor polynomial time approximation algorithm for computing the boxicity of any circular arc graph along with a corresponding box representation, where is its boxicity. For Normal Circular Arc(NCA) graphs, with an NCA model given, this can be improved to an additive -factor approximation algorithm. The time complexity of the algorithms to approximately compute the boxicity is in both these cases and in time we also get their corresponding box representations, where is the number of vertices of the graph and is its number of edges. The additive -factor algorithm directly works for any Proper Circular Arc graph, since computing an NCA model for it can be done in polynomial time.

## 1Introduction

Boxicity:

Boxicity of a graph is defined as the minimum number of interval graphs on the vertex set such that the intersection of their edge sets is . If are interval graphs on the vertex set such that , then is called a box representation of and is called the dimension of the representation. Equivalently, boxicity is the minimum integer such that can be represented as the intersection graph of -dimensional axis parallel rectangles in . Boxicity was introduced by Roberts [15] in 1968. If we have a box representation of dimension for a graph on vertices, it can be stored using space, whereas an adjacency list representation will need space which is for dense graphs. The availability of a box representation in low dimension makes some well known NP-hard problems like max-clique, polynomial time solvable[16].

Boxicity is combinatorially well studied and many bounds are known in terms of parameters like maximum degree, minimum vertex cover size and tree-width [4]. Boxicity of any graph is upper bounded by where is the number of vertices of the graph. It was shown by Scheinerman [17] in 1984 that the boxicity of outer planar graphs is at most two. In 1986, Thomassen [20] proved that the boxicity of planar graphs is at most 3. This parameter is also studied in relation with other dimensional parameters of graphs like partial order dimension and threshold dimension [2].

However, computing boxicity is a notoriously hard algorithmic problem. In 1981, Cozzens[5] showed that computing Boxicity is NP-Hard. Later Yannakakis [23] proved that determining whether boxicity of a graph is at most three is NP-Complete and Kratochvil[11] strengthened this by showing that determining whether boxicity of a graph is at most two itself is NP-Complete. Recently, Adiga et.al [2] proved that no polynomial time algorithm for approximating boxicity of bipartite graphs with approximation factor less than is possible unless . Same non-approximability holds in the case of split graphs and co-bipartite graphs too. Even an -factor approximation algorithm, with for boxicity is not known till now, for any well known graph class of unbounded boxicity. In this paper, we present a polynomial time -factor approximation algorithm for finding the boxicity of circular arc graphs along with the corresponding box representation, where is the boxicity of the graph. There exist circular arc graphs of arbitrarily high boxicity including the well known Robert’s graph (the complement of a perfect matching on vertices, with even) which achieves boxicity . For normal circular arc graphs, with an NCA model given, we give an additive -factor polynomial time approximation algorithm for the same problem. Note that, proper circular arc graphs form a subclass of NCA graphs and computing an NCA model for them can be done in polynomial time. We also give efficient ways of implementing all these algorithms.

**Circular Arc Graphs:** Circular Arc (CA) graphs are intersection graphs of arcs on a circle. That is, an arc of the circle is associated with each vertex and two vertices are adjacent if and only if their corresponding arcs overlap. It is sometimes thought of as a generalization of interval graphs which are intersection graphs of intervals on the real line. CA graphs became popular in 1970’s with a series of papers from Tucker, wherein he proved matrix characterizations for CA graphs [21] and structure theorems for some of its important subclasses[21]. For a detailed description, refer to the survey paper by Lin et.al [12]. Like in the case of interval graphs, linear time recognition algorithms exist for circular arc graphs too [14]. Some of the well known NP-complete problems like tree-width, path-width are known to be polynomial time solvable in the case of CA graphs[18]. However, unlike interval graphs, problems like minimum vertex coloring [8] and branchwidth [13] remain NP-Complete for CA graphs. We believe that boxicity belong to the second category.

A family of subsets of a set has the Helly property if for every subfamily of , with every two sets in pairwise intersecting, we also have . Similarly, a family of arcs satisfy Helly property if every subfamily of pairwise intersecting arcs have a common intersection point. The fundamental difficulty while dealing with CA graphs in comparison with interval graphs is the absence of Helly property for a family of circular arcs arising out of their circular adjacencies.

A Proper Circular Arc (PCA) graph is a graph which has some CA representation in which no arc is properly contained in another. A Unit Circular Arc (UCA) graph is one which has a CA representation in which all arcs are of the same length. A Helly Circular Arc (HCA) graph is one which has a representation satisfying the Helly property. In a CA representation , a pair of arcs are said to be circle cover arcs if they together cover the circumference of the circle. A Normal Circular Arc (NCA) graph is one which has a CA representation in which there are no pairs of circle cover arcs. It is known that UCA PCA NCA and UCA HCA NCA.

**Our main results in this paper are:**

Boxicity of any circular arc graph can be approximated within a -factor in polynomial time where is the boxicity of the graph.

The boxicity of any normal circular arc graph can be approximated within an additive -factor in polynomial time, given a normal circular arc model of the graph.

The time complexity of the algorithms to approximately compute the boxicity is in both the above cases and in time we also get their corresponding box representations, where is the number of vertices of the graph, its number of edges and its boxicity.

A structural result we obtained in this paper may be of independent interest. The following way of constructing an auxiliary graph of a given graph is from [1].

The structural properties of and its complement had been extensively investigated for various graph classes in the context of important problems like largest induced matching and minimum chain cover. The initial results were obtained by Golumbic et.al [9]. Cameron et.al [3] came up with some further results. A consolidation of the related results can be found in [3].

The following intermediate structural result in our paper becomes interesting in this context:

In Lemma ?, we observe that if is a bipartite graph whose complement is a CA graph, then is a comparability graph.

This is a generalization of similar results for convex bipartite graphs and interval bigraphs already known in literature [1]. This observation helps us in reducing the complexity of our polynomial time algorithms.

## 2Preliminaries

### 2.1Notations

We denote the vertex set of a given graph by and edge set by , with and . We use to denote . We denote the complement of by . We call a graph the union of graphs if they are graphs on the same vertex set and . Similarly, a graph is the intersection of graphs if they are graphs on the same vertex set and . We use to denote boxicity of and to denote chromatic number of .

A circular-arc (CA) model consists of a circle , together with a family of arcs of . It is assumed that is always traversed in the clockwise direction, unless stated otherwise. The arc corresponding to a vertex is denoted by , where and are the extreme points of on with its start point and its end point respectively, in the clockwise direction. Without loss of generality, we assume that no single arc of covers and no arc is empty or a single point.

An interval model consists of a family of intervals on real line. An interval corresponding to a vertex is denoted by a pair , where and are the left and right end points of the interval . Without loss of generality, we assume that an interval is always non-empty and is not a single point. We may use to represent both an interval graph and its interval model, when the meaning is clear from the context.

### 2.2A Vertex Numbering Scheme for Circular Arc Graphs

Let be a CA graph. Assume a CA model of is given. Let be any point on the circle . We define a numbering scheme for the vertices of denoted by which will be helpful for us in explaining further results.

Let be the clique corresponding to the arcs passing through and let . Let and . Number the vertices in as such that the vertex with its farthest (in the clockwise direction) from gets number and so on. Similarly, number the vertices in as such that the vertex with its farthest (in the clockwise direction) from gets number and so on. In both cases, break ties (if any) between vertices arbitrarily, while assigning numbers. See Figure ? for an illustration of the numbering scheme.

Now, observe that in , if a vertex is adjacent to a vertex , then at least one of the following is true: (a) the point is contained in the arc or (b) the point is contained in the arc . This implies that if is adjacent to , then either (a) is adjacent to all such that or (b) is adjacent to all such that . Thus we have the following lemma.

Using Lemma ?, we can prove the following in the case of co-bipartite CA graphs.

For a proof of this lemma, refer to Appendix A.

The following lemma is applicable in the case of co-bipartite graphs:

The proof is by construction of a CA model for . Refer to Appendix A for the proof.

## 3Computing the Boxicity of Co-bipartite CA Graphs in Polynomial Time

Using some theorems in the literature, in this section we infer that computing boxicity of co-bipartite CA graphs can be done in polynomial time. A bipartite graph is chordal bipartite if it does not contain any induced cycle of length .

A bipartite graph is called a chain graph if it does not contain any induced . The minimum chain cover number of , denoted by , is the minimum number of chain subgraphs of such that the union of their edge sets is .

Recall Definition ? of from Section 1.

By Theorem ?, if is a co-bipartite CA graph, then is chordal bipartite. Hence by Theorem ?, a chain cover of of minimum cardinality can be computed in polynomial time and . Combining with Theorem ?, we get :

## 4Reducing the Time Complexity of Computing the Boxicity of Co-bipartite CA Graphs

Let be the number edges of or equivalently, the number of vertices in . By Theorem ?, when is a chordal bipartite graph, is a perfect graph. Using the standard perfect graph coloring methods, an algorithm is given in [1] to compute . In time, they also compute a chain cover of minimum cardinality. However, can be as bad as in the worst case, where is the number of vertices of . In [1], for the restricted case when is an interval bigraph, they succeeded in reducing the complexity to , using the zero partitioning property of the adjacency matrix of interval bigraphs. Unfortunately, zero partitioning property being the defining property of interval bigraphs, we cannot use the method used in [1] in our case because of the following result by Hell and Huang [10]: A graph is an interval bigraph if and only if its complement is a co-bipartite CA graph admitting a normal CA model. Since there are co-bipartite CA graphs which do not permit a normal CA model, the complements of CA co-bipartite graphs form a strict super class of interval bigraphs. Hence to bring down the complexity of the algorithm from , we have to go for a new method. The key ingredient of our method is the following generalization of the results in [1].

Let . Let be a partitioning of the vertex set as described in Lemma ?, where and are cliques. Let and be the associated numbering scheme.

Consider two adjacent vertices of corresponding to the edges and of . Since they are adjacent, induces a in . Equivalently, these vertices induce a 4-cycle in with edges , , and . We claim that if and only if . To see this, assume that . Since , by the Bi-Consecutive property of the numbering scheme (Lemma ?), if , or , a contradiction. Hence, .

Now, to show that is a comparability graph, we define a relation as if and only if , with and and induces a in . In view of the claim proved in the paragraph above, if and are adjacent vertices of , they are comparable with respect to the relation .

Let and . We have inducing a 4-cycle in G with edges , , and . Similarly, induces a 4-cycle in with edges , , and . We also have and , by the definition of the relation . By the Bi-Consecutive property of the numbering scheme (Lemma ?), and implies that . Similarly, and implies that . Edges and are parts of cliques and . Hence, we have an induced 4-cycle in with edges , , and . We can conclude that . Thus the relation is transitive and hence, is a comparability graph.

### Improved Complexities

Lemma ? serves as the key ingredient in improving the time complexities of our algorithms. By the definition of , a proper coloring of the vertices of is same as coloring the edges of such that no two edges get the same color if their end points induce a in or equivalently a 4 cycle in . Since the number of edges in may be of , where , time for computing might go up to , if we use the standard algorithm for the vertex coloring of comparability graphs. Let , the number of edges between and in . We show that by utilizing the structure of along with the underlying comparability relation on the set of non-edges of defined in the proof of Lemma ?, computing the boxicity of can be done in , where is , . Each color class can be extended to a maximal independent set and thus get an optimum box representation of in , where = . The complexities claimed here are obtained by a suitable implementation of the greedy algorithm for the vertex coloring of comparability graphs, fine tuned for this special case and its careful amortized analysis. Due to the structural differences with interval bigraphs as explained before, this turned out to be much different from the method used in [1]. For a detailed description of the algorithm and its analysis, refer to Appendix B.

## 5Constant Factor Approximation for the Boxicity of CA Graphs

First we give a lemma which is an adaptation of a similar one given in [2].

For a proof of this lemma, see Appendix A.

### Approximation Algorithm

A method for computing a box representation of a given CA graph within a -factor where is the boxicity of is given in Algorithm ?. We use the algorithm for computing boxicity of co-bipartite CA graphs given in Section 3 as a subroutine here. Let and . We can show that a near optimal box representation of can be obtained in . For more details, refer to Appendix A. If we just want to compute the approximate boxicity of , it is enough to output , as proved below. This can be done in .

**Proof of correctness:** Let us analyze the non-trivial case when is not an interval graph. Otherwise, the correctness is obvious.

It can be easily seen that is a co-bipartite graph on the same vertex set as that of with cliques and and . Consider a numbering scheme of as described in Section 2.2 such that and , based on the CA model and the point as chosen in Algorithm ?. Notice that by construction of , for any pair of vertices and , if and only if . Recall that the numbering scheme satisfies Bi-Consecutive Adjacency Property for by Lemma ?. Clearly, the same will apply to also. Hence by Lemma ?, we can infer that is a co-bipartite CA graph.

It is easy to see that constructed in Line ? of Algorithm ? is a supergraph of , since is an interval representation of the induced subgraph of on and is an extension of on . Since is a box representation of , each , for is a supergraph of and in turn of too. is a clique in by definition. Consider any with and . Clearly, as well and since is a box representation of , such that for some . For any with , we have . Thus, .

Thus, is a valid box representation for G of size . By Lemma ?, , implying that is of size at most .

Lemma ? implies that is a -factor approximate box representation where is the boxicity of .

## 6Additive -Factor Approximation for the Boxicity of Normal CA Graphs

We assume that a normal CA model of is given. An additive two factor approximation algorithm for computing a box representation of normal CA graphs is given in Algorithm ?. We can show that in time, the algorithm outputs a near optimal box representation of where , and . Refer to Appendix A for more details. If we just want to compute the approximate boxicity of , it is enough to output , as proved below. This can be done in .

**Proof of correctness:** Since is a normal CA graph, the set of arcs passing through does not contain any circle cover pair of arcs. Therefore, does not cover the entire circle . So, any point in the arc , in particular the point defined in Line ? of Algorithm ?, is not contained in any arc passing through . It follows that . Since and are cliques, , the induced subgraph on is a co-bipartite CA subgraph of . We can compute an optimum box representation of in polynomial time using the method described in Section 3.

and are interval graphs because they are obtained by removing vertices corresponding to arcs in passing through points and respectively. Since is a supergraph of on and is the extension of on , we can conclude that is a super graph of . Similarly, is also a super graph of . Since is a box representation of , each is a supergraph of induced subgraph . Since is the extension of on , is a super graph of .

Consider . **Case (i)** If , by construction of , . **Case (ii)** If , by construction of , . Remember that and are cliques. If both (i) and (ii) are false, then one of {u, v} is in and the other is in . Since is a box representation of , for some . By construction of , too. Hence, . Thus we get is a valid box representation of of size which is at most , since is an induced subgraph of .

In Algorithm ?, we assumed that an NCA model of the graph is given. This was required because recognizing NCA graphs in polynomial time is still an open problem. We can observe that though the algorithm of this section is given for normal CA graphs, it can be used for a wider class as stated below.

In Line ? of Algorithm ?, select (guaranteed by the assumption of the theorem) as the point . Such a point can be found in time, if it exists. The rest of the algorithm is similar.

Though such a representation need not exist in general, it does exist for many important subclasses of of CA graphs and can be constructed in polynomial time; for example, for proper CA graphs or normal helly CA graphs. In fact, for these classes, construction of a normal CA (NCA) model itself from their adjacency matrices can be done in polynomial time.

## AAppendix 1

### Proof of Lemma :

Let be a co-bipartite CA graph. Recall that a circular arc model of is constructable in linear time. In any circular arc model of a co-bipartite CA graph , there are two points and on the circle such that every arc passes through at least one of them [22]. It is easy to see that these points can be identified in time. Let the clique corresponding to be denoted as . Let , which is clearly a clique, since the arcs corresponding to all vertices in pass through . Let and . Let vertices in be numbered and vertices of be numbered according to the numbering scheme as described in the beginning of Section 2.2. Clearly this numbering scheme satisfies Bi-Consecutive Adjacency Property by Lemma ?.

### Proof of Lemma :

The proof is by construction of a CA model for .

**Step 1:** Choose four distinct points in the clockwise order on . Initially fix for all and for all . Choose distinct points , , , in the clockwise order on the arc and set for all . Choose distinct points , , , in the clockwise order on the arc and set for all . As of now, the family of arcs that we have constructed represents two disjoint cliques corresponding to and .

**Step 2:** Now we will modify the start points of each arc as follows: Consider vertex . If is the highest numbered vertex in such that is adjacent to all with , then set . Similarly, Consider vertex . If is the highest numbered vertex in such that is adjacent to all with , then set . Notice that we are not making any adjacencies not present in between vertices of and in this step.

Since and are cliques, what remains to prove is that if a vertex is adjacent to a vertex , their corresponding arcs overlap. Consider such an edge . If is adjacent to all such that , we would have extended to meet in Step 2 above. If this does not occur, then by assumed Bi-Consecutive Adjacency Property, is adjacent to all such that . In this case, we would have extended to meet in Step 2. In both cases, the arcs corresponding to vertices and overlap. We got a CA model of proving that is a CA graph.

### Proof of Lemma :

Let be the boxicity of and be an optimal box representation of . For each , let and . Let be the interval graph obtained from by assigning the interval , and the interval , . Let be the interval graph obtained from by assigning the interval , and the interval , .

Note that, in constructing and we have only extended some of the intervals of and therefore, and are super graphs of and in turn of . By construction, induces cliques in both and , and thus they are supergraphs of too.

Now, consider with , . Then either or . If , then clearly the intervals and do not intersect and thus . Similarly, if , then . If both and , then such that for some and clearly by construction, and .

It follows that and therefore, .

### Time Complexity of the Algorithm of Section :

Let and . Whether the given graph is an interval graph can be determined in linear time. Given any CA graph , we can compute a CA model for in linear time [14]. A partition of the vertex set of as mentioned in the algorithm can be constructed in time from the CA model of . Construction of from can also be done in . Let and and and . In Section 4, we discussed how to compute the boxicity of in time and an optimal box representation of co-bipartite CA graph in where is the boxicity of and . Since by construction of , , the time complexity is . The additional work for computing in the construction of can be done in . Thus, a near optimal box representation of is obtained in .

### Time Complexity of the Algorithm of Section :

Whether the given graph is an interval graph can be determined in linear time. Choosing point can be done in time. Construction of from can also be done in . Let and . Since is an induced subgraph of , . In Section 4, we discussed how to compute the boxicity of in and an optimal box representation of co-bipartite CA graph in , where and . The additional work for computing the interval supergraphs is . Construction of from requires only time in total. Thus, in time, the algorithm outputs a near optimal box representation of .

## BAppendix 2 - Complexity of Computing the Boxicity and Optimal Box Representation of Co-bipartite CA graphs

Let be a co-bipartite CA graph with and . Let be a partitioning of the vertex set as described in Lemma ?, where and are cliques. Let and be the associated numbering scheme. Let and and . Let and . In this section, we will show an algorithm to compute the boxicity and an algorithm to get an optimal box representation of . Let . Recall that by Theorem ?, . Let be the color classes in an optimal coloring of . For , let be a maximal independent set containing and : corresponds to a vertex in . By Theorem ?, : , gives an optimal box representation of .

### b.1Computing the Boxicity of in Time

We call a *non-edge* of , if it is an edge of . Recall that by Lemma ?, is a comparability graph. We had defined a transitive relation on i.e on the non-edges of , in the proof of Lemma ? as follows : if and only if and and induces a 4-cycle in . Since is a comparability graph, any coloring satisfying the property that the color assigned to (the vertex corresponding to) a non-edge equals is an optimum coloring [7] of . We refer to this as greedy strategy in our further discussion. For convenience, hereafter we refer to the coloring of a vertex of as coloring of the corresponding non-edge of .

Assume that the colors available are . The following definitions are with respect to . For , let represent the set of neighbors of in and . Similarly, for , and . Let denote . The linked lists corresponding to and for each and and for each , with their entries sorted with respect to the numbering scheme described in the above paragraph, can be constructed from the adjacency list of . This can be done in overall time. We will assume that lists are global data structures.

For , we color the non-edges incident on by invoking Algorithm ?, for in that order. For the convenience of our analysis, we refer to an invocation of Algorithm ? for vertex as *the processing of *. Note that, by the time a non-edge of is considered for coloring, i.e, during the processing of , all non-edges such that are already colored, since, by the definition of , and is processed before . Consider a non-edge of . Let . According to the greedy algorithm, the non-edge of has to get the color , where .

The next question is how to find efficiently. For that we need to understand the set more closely. Let and .

Since , we need to show that for any , if and only if and . Recall that if and only if , and induces a 4-cycle in . Observe that . It is easy to see that if and , then .

To prove the other direction, assume that . Then we have , and therefore, . Similarly, , . Suppose . Since the numbering scheme satisfies Bi-Consecutive Adjacency Property, implies that either or , which is a contradiction. Therefore and therefore, .

By the above claim, . But, if we have to do this computation separately for each , then for any which is in of more than one , the computation of has to be repeated. To avoid this repetition, in Algorithm ? we process all non-edges incident at a vertex in parallel as follows. In Lines ? to ? of Algorithm ?, for each , is computed and stored. This is referred to as Type 1 work in the algorithm. For each , Lines ? to ? referred to as Type 2 work in Algorithm ? computes using the values of already computed and stored as part of Type 1 work. In the process, for each , the algorithm assigns the color to , which is the optimum color suggested by the greedy strategy.

Let and and . Let and .

Type 0 work (Lines ? to ?) computes lists and as defined in the algorithm and also an indicator array of . and can be represented as doubly linked lists. Initializations in Line ? can be achieved in time as follows: can be initialized to in once for each . The total time for this work is . Each spends at most time for the processing of each . Thus the total time spent by all together for this initialization is . Initialization of in Line ? can be done in . Summing over all , this amounts to work. Similarly, for initializing in Line ?, we need . Summed over all , this amounts to work. Adding all the above, total cost of Type 0 work (over all invocations of Algorithm ?) is , since .

Let us calculate the total cost spent in Type 1 work. Note that each element remembers the pointer position . This means that continues from where it stopped in the current iteration, while doing the Type 1 work of the next element of . Therefore, pointer moves at most times for each . When reaches the end of list , is deleted from the linked list . This makes sure that Line ? is repeated just times for each . Each executes Line ? whenever . This also happens times during the processing of each where . Hence the total cost for Type 1 work (over all invocations of Algorithm ?) is .

Now consider Type 2 work. Note that each element remembers the pointer position . This means that continues from where it stopped in the current iteration while doing the Type 2 work of the next element of . Therefore, pointer moves at most times for each . When reaches end of list , is dropped from the linked list . This makes sure that Line ? is repeated only times for each , while processing an such that . Also, Line ? is executed only when . This happens times during the processing of each , where . Summing up, the total cost for Type 2 work (over all invocations of Algorithm ?) is