An Approximation Algorithm for Maximum Internal Spanning Tree
Abstract
Given a graph , the maximum internal spanning tree problem (MIST for short) asks for computing a spanning tree of such that the number of internal vertices in is maximized. MIST has possible applications in the design of costefficient communication networks and water supply networks and hence has been extensively studied in the literature. MIST is NPhard and hence a number of polynomialtime approximation algorithms have been designed for MIST in the literature. The previously best polynomialtime approximation algorithm for MIST achieves a ratio of . In this paper, we first design a simpler algorithm that achieves the same ratio and the same time complexity as the previous best. We then refine the algorithm into a new approximation algorithm that achieves a better ratio (namely, ) with the same time complexity. Our new algorithm explores much deeper structure of the problem than the previous best. The discovered structure may be used to design even better approximation or parameterized algorithms for the problem in the future.
Keywords: Approximation algorithms, spanning trees, pathcycle covers.
1 Introduction
The maximum internal spanning tree problem (MIST for short) requires the computation of a spanning tree in a given graph such that the number of internal vertices in is maximized. MIST has possible applications in the design of costefficient communication networks [17] and water supply networks [1]. Unfortunately, MIST is clearly NPhard because the problem of finding a Hamiltonian path in a given graph is NPhard [5] and can be easily reduced to MIST. MIST is in fact APXhard [9] and hence does not admit a polynomialtime approximation scheme.
Since MIST is APXhard, it is of interest to design polynomialtime approximation algorithms for it that achieve a constant ratio as close to 1 as possible. Indeed, Prieto and Sliper [12] presented a polynomialtime approximation algorithm for MIST achieving a ratio of . Their algorithm is based on local search. By slightly modifying Prieto and Sliper’s algorithm, Salamon and Wiener [17] then obtained a faster (lineartime) approximation algorithm achieving the same ratio. Salamon and Wiener [17] also considered two special cases of MISP. More specifically, they [17] designed a polynomialtime approximation algorithm for the special case of MIST restricted to clawfree graphs that achieves a ratio of , and also designed a polynomialtime approximation algorithm for the special case of MIST restricted to cubic graphs that achieves a ratio of . Salamon [15] later proved that the approximation algorithm in [17] indeed achieves a performance ratio of for the special case of MIST restricted to regular graphs. Based on local optimization, Salamon [16] further came up with an time approximation algorithm for the special of MIST restricted to graphs without leaves that achieves a ratio of . The algorithm in [16] was subsequently simplified and reanalyzed by Knauer and Spoerhase [7] so that it runs faster (in cubic time) and achieves a better ratio (namely, ) for (the general) MIST. Li et al. [8] even went further by showing that a deeper local search than those in [7] and [16] can achieve a ratio of for MIST. Recently, Li and Zhu [9] came up with a polynomialtime approximation algorithm for MIST that achieves a ratio of . Unlike the other previously known approximation algorithms for MIST, the algorithm in [9] is based on a simple but crucial observation that the maximum number of internal vertices in a spanning tree of a graph can be bounded from above by the maximum number of edges in a trianglefree pathcycle cover of .
In the weighted version of MIST (WMIST for short), each vertex of the given graph has a nonnegative weight and the objective is to find a spanning tree of such that the total weight of internal vertices in is maximized. Salamon [16] designed an time approximation for WMIST that achieves a ratio of , where is the maximum degree of a vertex in the input graph. Salamon [16] also considered the special case of WMIST restricted to clawfree graphs without leaves, and designed an time approximation algorithm for the special case that achieves a ratio of . Subsequently, Knauer and Spoerhase [7] proposed a polynomialtime approximation algorithm for (the general) WMIST that achieves a ratio of for any constant .
In the parameterized version of MIST (PMIST for short), we are asked to decide whether a given graph has a spanning tree with at least a given number of internal vertices. PMIST and its special cases and variants have also been extensively studied in the literature [1, 2, 3, 4, 10, 11, 12, 13, 14]. The best known kernel for PMIST is of size and it leads to the fastest known algorithm for PMIST with running time [11].
In this paper, we first give a new approximation algorithm for MIST that is simpler than the one in [9] but achieves the same approximation ratio and time complexity. In more details, the time complexity is dominated by that of computing a maximum trianglefree pathcycle cover in a graph. We then show that the algorithm can be refined into a new approximation algorithm for MIST that has the same time complexity as the algorithm in [9] but achieves a better ratio (namely, ). To obtain our algorithm, we use three new main ideas. The first main idea is to bound the maximum number of internal vertices in a spanning tree of a graph by the maximum number of edges in a special (rather than general) trianglefree pathcycle cover of . Roughly speaking, we can figure out that certain vertices in must be leaves in an optimal spanning tree of , and hence we can require that the degrees of these vertices be at most 1 when computing a maximum trianglefree pathcycle cover of . In this sense, is special and can have significantly fewer edges than a maximum (general) trianglefree pathcycle cover of , and hence gives us a tighter upper bound. The second idea is to carefully modify into a spanning tree by local improvement. Unfortunately, we can not always guarantee that the number of internal vertices in is at least times the number of edges in . Our third idea is to show that if this unfortunate case occurs, then an optimal spanning tree of cannot have so many internal vertices. These ideas may be used to design even better approximation or parameterized algorithms for MIST in the future.
The remainder of this paper is organized as follows. Section 2 gives basic definitions that will be used in the remainder of the paper. Section 3 presents a simple approximation algorithm for MIST that achieves a ratio of . The subsequent sections are devoted to refining the algorithm so that it achieves a better ratio.
2 Basic Definitions
Throughout this chapter, a graph means a simple undirected graph (i.e., it has neither parallel edges nor selfloops).
Let be a graph. We denote the vertex set of by , and denote the edge set of by . For a subset of , denotes the graph obtained from by removing the vertices in (together with the edges incident to them), while denotes . We call the subgraph of induced by . For a subset of , denotes the graph obtained from by removing the edges in . An edge of is a bridge of if has more connected components than , and is a nonbridge otherwise. A vertex of is a cutpoint if has more connected components than .
Let be a vertex of . The neighborhood of in , denoted by , is . The degree of in , denoted by , is . If , then is an isolated vertex of . If , then is a leaf of ; otherwise, is a nonleaf of . We use to denote the set of leaves in .
Let be a subgraph of . denotes . A port of is a with . When is a path, is dead if neither endpoint of is a port of , while is alive otherwise. and another subgraph of are adjacent in if but (or equivalently, ).
A cycle in is a connected subgraph of in which each vertex is of degree 2. A path in is either a single vertex of or a connected subgraph of in which exactly two vertices are of degree 1 and the others are of degree 2. A vertex of a path in is an endpoint of if , and is an internal vertex of if . The length of a cycle or path is the number of edges in and is denoted by . A cycle is a cycle of length , while a path is a path of length . A tree (respectively, cycle) component of is a connected component of that is a tree (respectively, cycle). In particular, if a tree component of is indeed a path (respectively, path), then we call a path (respectively, path) component of .
A treecycle cover (TCC for short) of is a subgraph of such that and each connected component of is a tree or cycle. Let be a TCC of . is a Hamiltonian path (respectively, cycle) of if is a path (respectively, cycle), and is a spanning tree of if is a tree. is a pathcycle cover (PCC for short) of if each tree component of is a path. is a path cover of if has only path components. A trianglefree TCC (TFTCC for short) of is a TCC without 3cycles. Similarly, a trianglefree PCC (TFPCC for short) of is a PCC without 3cycles. A TFPCC of is maximum if its number of edges is maximized over all TFPCCs of . For convenience, let denote the time complexity of computing a maximum TFPCC in a graph with vertices and edges. It is known that [6].
Suppose that is connected. The weight of a spanning tree of , denoted by , is the number of nonleaves in . We use to denote the maximum weight of a spanning tree of . An optimal spanning tree (OST for short) of is a spanning tree of with .
3 A Simple 0.75Approximation Algorithm
Throughout the remainder of this paper, means a connected graph for which we want to find an OST. Moreover, denotes an OST of . For convenience, let and .
3.1 Reduction Rules
We want to make smaller (say, by deleting one or more vertices or edges from ) without decreasing . For this purpose, we define two strongly safe operations on below. Here, an operation on is strongly safe if performing it on does not change .
 Operation 1.

If and contains two edges and such that both and are leaves of , then delete .
 Operation 2.

If for a nonbridge of , has a connected component with for each , then delete . (Comment: When , Li and Zhu [9] showed that Operation 2 is strongly safe.)
Lemma 3.1
[9] Operation 1 is strongly safe.
Lemma 3.2
Operation 2 is strongly safe.
Proof. If , we are done. So, assume that . Obviously, at least one vertex of is adjacent to in because is connected. So, . Similarly, for some vertex of . Moreover, since is a nonbridge of , has a connected component (other than and ) with . Since is connected, or is adjacent to a vertex of in . We assume that ; the other case is similar. Then, after deleting from , only may become a new leaf. If becomes a leaf in , then all vertices of must belong to the component tree of containing and hence adding an arbitrary edge of with to yields a new OST of . So, we may assume that does not become a leaf in . Then, since is a nonbridge of , must have an edge such that for each , belongs to the component tree of containing . Now, adding the edge to yields a new OST of .
An operation on is weakly safe if performing it on yields one or more graphs , …, such that (1) , , and , (2) for some nonnegative integer , and (3) given a spanning tree for each , a spanning tree of with can be computed in linear time. Note that the last two conditions in the definition imply that .
 Operation 3.

If has a bridge such that for each , is a cutpoint in the connected component of with , then obtain and as the connected components of .
 Operation 4.

If has a cutpoint such that one connected component of has at least two but at most 8 vertices, then obtain from by adding a new vertex and a new edge .
The number 8 in the definition of Operation 4 is not essential. It can be chosen at one’s discretion as long as it is a constant. We here choose the number 8, because it will be the smallest number for the proofs of several lemmas in this paper to go through.
Lemma 3.3
Operation 3 is weakly safe.
Proof. First, we want to show that . Consider an . Since is a cutpoint in , . Thus, the degree of in is at least 2. So, one component tree of is a spanning tree of , the other is a spanning tree of , and their total weights equals . Thus, .
Next, suppose that for each , is a spanning tree of . Since is a cutpoint in , . So, using to connect and into a single tree yields a spanning tree of whose weight is .
Lemma 3.4
Operation 4 is weakly safe.
Proof. Let be the graph obtained from by adding a new vertex and a new edge . Let .
First, we want to show that . Since is a cutpoint of , . Let be the spanning tree of obtained from by adding and the edge . Further let be the spanning tree of obtained from by adding and edge . Clearly, . Thus, .
Next, suppose that is a spanning tree of . Let be an OST of . We can obtain a spanning tree of from by first deleting , next adding , and further adding new edges to connect to those vertices of that are adjacent to in . Obviously, , , , , the degree of each vertex of other than and in is , and the degree of each vertex of other than and in is . Thus, .
An operation on is safe if it is strongly or weakly safe on .
3.2 The Algorithm
As in [9], the algorithm is based on a lemma which says that has a path cover such that is bounded from above by the number of edges in . We next state the lemma in a stronger form and give an extremely simple proof.
Lemma 3.5
Given a spanning tree of , we can construct a path cover of such that and for each leaf of .
Proof. We simply construct from by first rooting at an arbitrary nonleaf and then for each nonleaf of , deleting all but one edge between and its children.
Now, the outline of the algorithm is as follows.

Whenever there is an such that Operation can be performed on , then perform Operation on .

Whenever there is an such that Operation can be performed on , then perform the following steps:

Perform Operation on . Let , …, be the resulting graphs.

For each , compute a spanning tree of recursively.

Combine , …, into a spanning tree of such that .

Return .


If , then compute and return an OST of in time.

Compute a maximum TFPCC of . (Comment: By Lemma 3.5, ).

Perform a preprocessing on without decreasing .

Transform into a spanning tree of such that .

Return .
Only Steps 5 and 6 are unclear. So, we detail them below. First, Step 5 is done by performing the next three operations until none of them is applicable.
 Operation 5.

If has a dead path component such that and has an alive Hamiltonian path , then replace by .
 Operation 6.

If an endpoint of a path component of is adjacent to a vertex of a cycle of in , then combine and into a single path by replacing one edge incident to in with the edge .
 Operation 7.

If an endpoint of a path component of is adjacent to an internal vertex of another path component in such that one edge incident to in satisfies that combining and by replacing with the edge yields two paths and with , then replace and by and . (Comment: For each , Operation does not change the maximality of . So, due to the maximality of , no endpoint of a path component of is adjacent to an endpoint of another path component in .)
Lemma 3.6
Immediately after Step 5, the following statements hold:

is a maximum TFPCC of and hence has at least edges.

If a path component of is of length at most 3, then is alive.

If an endpoint of a path component of is a port of , then each vertex in is an internal vertex of a path component of with .
Proof. We prove the statements separately as follows.
Statement 1: Immediately before Step 5, has is a maximum TFPCC of . Since Operations 5 through 7 keep being a TFPCC without changing the number of edges in , Statement 1 holds.
Statement 2: Let be a path component of with . If , then is alive because otherwise would be disconnected. So, or 3. Let and be the endpoints of . For a contradiction, assume that is dead. Then, since is connected, has at least one internal vertex adjacent to a vertex in . If , then has a Hamiltonian path in which is an endpoint, contradicting the fact that Operation 5 cannot be performed on . So, we assume that . Now, if , then Operation 1 can be performed on , a contradiction. Thus, we further assume that . Then, since Operation 4 cannot be performed on , the other internal vertex (than ) of is adjacent to a vertex in . Now, if is not itself, then Operation 5 can be performed on , a contradiction; otherwise, Operation 2 or 3 can be performed on , a contradiction. Note that it does not matter whether or not.
Statement 3: Suppose that an endpoint of a path component of is a port. Consider an arbitrary . Since Operation 6 is not applicable on , appears in a path component of . Then, by the comment on Operation 7, is an internal vertex of . Let and be the endpoints of . For each , let be the path from to in . Then, . Moreover, since Operation 7 cannot be applied on , for each . Thus, .
We next detail Step 6. First, for each path component of with , we select one edge connecting an endpoint of to a vertex not in , and add to an initially empty set . Such exists by Statement 2 in Lemma 3.6. Moreover, by Statement 3 in Lemma 3.6, the endpoint of not in appears in a path component of with . So, for two path components and in , . Consider the graph obtained from by adding the edges in . Each connected component of is a cycle of length at least 4 or a tree. Suppose that we modify by performing the following three steps in turn:

Whenever has two cycles and such that some edge satisfies and , delete one edge of incident to from , delete one edge of incident to from , and add to .

Whenever has a cycle , choose an edge with and , delete one edge of incident to from , and add to .

Whenever has two connected components and such that some edge satisfies and , add to .
Step 6 is done by obtaining as the final modified . Obviously, for each cycle of , at least vertices of are internal vertices of . Moreover, for each path component of with , at least vertices of are internal vertices of . Furthermore, for each path component of with , at least vertices of are internal vertices of . So, has at least internal vertices. Obviously, all steps of the algorithm excluding Steps 2b and 4 can be done in time. Now, we have the following theorem:
Theorem 3.7
The algorithm achieves an approximation ratio of and runs in time.
In the sequel, we consider how to improve the algorithm. The first idea is to introduce more safe reduction rules (cf. Section 4). The second idea is to compute a better upper bound on than that given by a maximum TFPCC (cf. Section 5). The third idea is to perform a more sophisticated preprocessing on (cf. Section 6). The last idea is to transform into a spanning tree of more carefully (cf. Section 7).
4 More Safe Reduction Rules
In addition to the four safe reduction rules in Section 3.1, we further introduce the following rules.
 Operation 8.

If for four vertices , …, , , has a connected component with , then delete the edge .
 Operation 9.

If for five vertices , …, , , then delete the edge .
 Operation 10.

If for two vertices and of , has a connected component with such that and has a Hamiltonian path from to , then delete all edges of that do not appear in .
 Operation 11.

If has an edge with , then obtain from by merging and into a single vertex .
Lemma 4.1
Operation 8 is strongly safe.
Proof. If , we are done. So, assume that . Obviously, at least one vertex of is adjacent to in because is connected. So, . For each , let be the component tree of in which appears. If , then is a leaf of and hence adding the edge to clearly yields a spanning tree of with . So, we assume . If , then is a leaf of and hence adding the edge to clearly yields a spanning tree of with . Otherwise, is a leaf of and hence adding the edge to clearly yields a spanning tree of with .
Lemma 4.2
Operation 9 is strongly safe.
Proof. If , we are done. So, assume that . Obviously, . Moreover, if for some , and , then the proof of Lemma 4.1 shows that can be transformed into a spanning tree such that and . Thus, we may assume that , , and . Obviously, either or . In the latter case, adding the edge to clearly yields a spanning tree of , and holds for . So, we assume the former case. Let . Then, adding the edges and to clearly yields a spanning tree of with .
Lemma 4.3
Operation 10 is strongly safe.
Proof. Operation 10 is clearly strongly safe if . So, we assume that . Since is a connected component, the degree of each vertex in is unless . Let be the path between and in .
Let be the set of internal vertices of . Since is a connected component of , either or . Obviously, we are done if . So, we assume that is either empty or contains at least one but not all vertices of . Then, has one or more component trees in which at least one vertex of appears. Let , …, be such component trees. For each , because is a connected component of . Moreover, if , then . Since for at least one , . Furthermore, if , then .
Case 1: is a nonempty proper subset of . Then, modifying by adding the edges of yields a new spanning tree of . Clearly, . Moreover, since , it is impossible that . So, . Consequently, because .
Case 2: . Then, both and are of degree at least 1 in . We assume that the degree of in is at least as large as that of in ; the other case is similar. Let be the neighbor of in . It is possible that . Obviously, modifying by adding the edges of and deleting the edge yields a new spanning tree of . Clearly, . Thus, if , then because . Moreover, if , then and in turn . So, we may assume that and . Then, the degree of in is 1 and in turn so is . Now, since and , is adjacent to no vertex of in and hence is a leaf of . Therefore, no matter whether or not, because .
Lemma 4.4
Operation 11 is weakly safe.
Proof. For each , let be the vertex in . Possibly, . If , then ; otherwise, .
First, we want to show that . If , then contains both and and we can modify (without decreasing ) by replacing the edge with . So, we can assume that . Then, it is clear that modifying by merging and into a single vertex yields a spanning tree of whose weight is . Thus, .
Next, suppose that is a spanning tree of . If , then is a leaf of and its neighbor in is , and hence modifying by deleting the vertex and adding the two edges , yields a spanning tree of whose weight is . So, we assume that . Clearly, at least one of and is an edge of . If for exactly one , , then modifying by deleting the vertex and adding the two edges , yields a spanning tree of whose weight is . Otherwise, modifying by deleting the vertex and adding the three edges , , yields a spanning tree of whose weight is .
5 Computing a Preferred TFPCC
In this section, we consider how to refine Step 4. Because of Steps 1 and 3, we hereafter assume that and there is no such that Operation can be performed on . Then, we can prove the next lemma:
Lemma 5.1
Suppose that is a cycle of with . Let be the set of ports of . Then, the following statements hold.

.

If , then the two vertices in are not adjacent in and .

If and , then and are the same graph.
Proof. We prove the statements separately as follows.
Statement 1: Since is connected and , . Moreover, since Operation 4 cannot be performed on , .
Statement 2: Suppose that . Then, the two vertices in cannot be adjacent in , because otherwise Operation 10 could be performed on . For a contradiction, assume that . Suppose that , …, are the vertices of a 5cycle of and appear in clockwise in this order. Since the two vertices in are not adjacent in , we may assume that . If or , then Operation 10 can be performed on , a contradiction. So, we assume that and . If or , then Operation 10 can be performed on , a contradiction. Thus, we may further assume that and . Now, , , and . Hence, Operation 11 can be performed on , a contradiction.
Statement 3: Suppose that and . The two vertices in are not adjacent in by Statement 2, and hence and are the same graph because otherwise Operation 10 could be performed on .
To refine Step 4, our idea is to compute as a preferred TFPCC of . Before defining what the word “preferred” means here, we need to prove a lemma. For ease of explanation, we assume, with loss of generality, that there is a linear order (denoted by ) on the vertices of .
Lemma 5.2
Suppose that and are two vertices of such that and Condition C1 below holds. Then, has an OST in which or is a leaf. Consequently, has an OST in which is a leaf.

For two vertices and in , .
Proof. If is a leaf of , then we are done. So, assume that is not a leaf of . Since Condition C1 holds, is clearly a leaf of and we can modify (without decreasing ) by switching and so that becomes a leaf in .
If Condition C1 in Lemma 5.2 holds for and , we refer to and as the boundary points of the pair , and refer to the edges incident to or as the supports of .
Let be the set of pairs of vertices in satisfying Condition C1. It is worth pointing out that for each and each boundary point of , because otherwise Operation 4 could be performed on .
Lemma 5.3
No two pairs in share a support.
Proof. Obviously, for two pairs in to share a support, they have to share their boundary points. However, no two pairs in can share their boundary points, because otherwise Operation 9 could be performed on . So, no two pairs in share a support.
Lemma 5.4
has an OST in which is a leaf for each .
Proof. By Lemma 5.2, we can assume that for every , . In a nutshell, the proof of Lemma 5.2 shows that even if is an OST with , we can modify without decreasing so that . Indeed, the modification only uses the supports of . Now, by Lemma 5.3, a similar modification can be done independently for each other . Therefore, the lemma holds.
Now, we are ready to make two definitions. Let be a TFPCC of . is special if for every pair , . is preferred if is special and is maximized over all special TFPCCs of .
Lemma 5.5
If is a preferred TFPCC of , then .