Hardness of approximation for H-free edge modification problemsThe research of Mi. Pilipczuk is supported by Polish National Science Centre grant UMO-2013/11/D/ST6/03073. Mi. Pilipczuk is also supported by the Foundation for Polish Science (FNP) via the START stipend programme.

The -free Edge Deletion problem asks, for a given graph and integer , whether it is possible to delete at most edges from to make it -free, that is, not containing as an induced subgraph. The -free Edge Completion problem is defined similarly, but we add edges instead of deleting them. The study of these two problem families has recently been the subject of intensive studies from the point of view of parameterized complexity and kernelization. In particular, it was shown that the problems do not admit polynomial kernels (under plausible complexity assumptions) for almost all graphs , with several important exceptions occurring when the class of -free graphs exhibits some structural properties.

In this work we complement the parameterized study of edge modification problems to -free graphs by considering their approximability. We prove that whenever is -connected and has at least two non-edges, then both -free Edge Deletion and -free Edge Completion are very hard to approximate: they do not admit -approximation in polynomial time, unless , or even in time subexponential in , unless the Exponential Time Hypothesis fails. The assumption of the existence of two non-edges appears to be important: we show that whenever is a complete graph without one edge, then -free Edge Deletion is tightly connected to the Min Horn Deletion problem, whose approximability is still open. Finally, in an attempt to extend our hardness results beyond -connected graphs, we consider the cases of being a path or a cycle, and we achieve an almost complete dichotomy there.

## 1 Introduction

We consider the following general setting of graph modification problems: given a graph , one would like to modify as little as possible in order to make it satisfy some fixed property of global nature. Motivated by applications in de-noising data derived from imprecise experimental measurements, graph modification problems occupy a prominent role in the field of parameterized complexity and kernelization. This is because the allowed number of modifications usually can be assumed to be small compared to the total instance size, which exactly fits the motivation of considering it as the parameter of the instance.

Moving to the formal setting, consider some hereditary class of graphs , that is, a class closed under taking induced subgraphs. For such a class , we can define several problems depending on the set of allowed modifications. In each case the input consists of a graph and integer , and the question is whether one can apply at most modification to so that it falls into class . In this paper we will consider deletion and completion problems, where we are allowed only to delete edges, respectively only to add edges. However, other studied variants include vertex deletion problems (the allowed modification is removal of a vertex) and editing problems (both edge deletions and completions are allowed). Moreover, we restrict ourselves to classes characterized by one forbidden induced subgraph . In other words, is the class of -free graphs, that is, graphs that do not contain as an induced subgraph.

The study of the parameterized complexity of -free Edge Deletion and -free Edge Completion focused on two aspects: designing fixed-parameter algorithms and kernelization procedures. The classic observation of Cai [3] shows that -free Edge Deletion (Completion) can be both solved in time for some constant depending only on , using a straightforward branching strategy. However, for several completion problems related to chordal graphs and their subclasses, like (proper) interval graphs or trivially perfect graphs, one can design subexponential parameterized algorithms, typically with the running time of . The study of this surprising subexponential phenomenon, and of its limits, has recently been the subject of intensive studies; we refer to the introductory section of [2] for more details. However, for the vast majority of graphs , the running time of the form is essentially the best one can hope for -free Edge Deletion (Completion). Indeed, Aravind et al. [1] proved that, whenever has at least two edges, then -free Edge Deletion is NP-hard and has no -time algorithm unless the Exponential Time Hypothesis fails, and the same result holds for -free Edge Completion whenever has at least two non-edges. The remaining cases are easily seen to be polynomial-time solvable, so this establishes a full dichotomy.

Another interesting aspect of graph modification problems is their kernelization complexity. Recall that a polynomial kernel for a parameterized problem is a polynomial-time algorithm that, given an instance of the problem with parameter , reduces it to another instance of the same problem that has size bounded polynomially in . While every -Free Vertex Deletion problem admits a simple polynomial kernel by a reduction to the -Hitting Set problem (for ), the situation for edge deletion and edge completion problems is much more complex. This is because the removal/addition of some edge may create new induced copies of that were originally not present, and hence the obstacles can “propagate” in the graph. In fact, a line of work [4, 5, 9, 12] showed that, unless , polynomial kernels for the -free Edge Deletion (Completion) problems exist only for very simple graphs , for which the class of -free graphs exhibits some structural property. This line culminated in the work of Cai and Cai [4, 5], who attempted to obtain a complete dichotomy. While this goal was not fully achieved and there are some cases missing, the obtained complexity picture explains the general situation very well. For example, Cai and Cai [4, 5] showed that polynomial kernels do not exist (under ) for the -free Edge Deletion (Completion) problems whenever is -connected and has at least non-edges. Nontrivial positive cases include e.g. being a path on vertices [9] (that is, Cograph Edge Deletion (Completion)), and being a minus one edge [5] (that is, Diamond-free Edge Deletion). One of the most prominent open cases left is the kernelization complexity of Claw-Free Edge Deletion [4, 6].

#### Our motivation and results.

The starting point of our work is the realization that the propagational character of -free Edge Deletion (Completion), which is the basic explanation of its apparent kernelization hardness, also makes the greedy approach to approximation incorrect. One cannot greedily remove all the edges of any copy of in the graph, because removing an edge does not necessarily always help: it may create new copies of in the instance. Hence, the approximation complexity of -free Edge Deletion (Completion) is actually also highly unclear. On the other hand, the links between approximation and kernelization are well-known in parameterized complexity: it is often the case that a polynomial kernel for a problem can be turned into a -approximation algorithm (i.e. an algorithm that returns a solution of cost bounded by some polynomial function of the optimum), by just taking greedily the kernel and reverting the reduction rules. While this intuitive link is far from being formal, and actually there are examples of problems behaving differently [8], it is definitely the case that the combinatorial insight given by kernelization algorithms may be very useful in the approximation setting.

Therefore, we propose to study the approximability of -free Edge Deletion (Completion) as well, alongside with the best possible running times of fixed-parameter algorithms and the existence of polynomial kernels. This work is the first step in this direction.

We prove that the -free Edge Deletion (Completion) problems are very hard to approximate for a vast majority of graphs , which mirrors the kernelization hardness results of Cai and Cai [4, 5]. The following theorem explains our main result formally.

###### Theorem 1.

Let be a -connected graph with at least two non-edges. Then, unless , neither -free Edge Deletion nor -free Edge Completion admits a -approximation algorithm running in polynomial time. Moreover, unless the Exponential Time Hypothesis fails, neither of these problems admits even a -approximation algorithm running in time .

Theorem 1 makes two structural assumptions about graph : that it is -connected, and has at least two non-edges. The first one is a crucial technical ingredient in the reductions, because it enables us to argue that for any vertex cut of size , every copy of in the graph is completely contained on one side of the cut. Relaxing this assumption is a major issue addressed by Cai and Cai [4, 5] in their work. In an attempt to lift this assumption in our setting as well, we try to resolve the case of being a path or a cycle first; this reflects the development of the story of kernelization hardness for the considered problems [5, 4, 9, 12]. The following theorem summarizes our results in this direction.

###### Theorem 2.

Let be a cycle on at least vertices or a path on at least vertices. Then, unless , neither -free Edge Deletion nor -free Edge Completion admits a -approximation algorithm running in polynomial time. Moreover, unless the Exponential Time Hypothesis fails, neither of these problems admits even a -approximation algorithm running in time .

Together with some easy cases and known positive results [14], this gives an almost complete dichotomy for paths and cycles. The only missing case is Cograph Edge Deletion (for ), for which we expect a positive answer due to the existence of a polynomial kernel [9]. However, our preliminary attempt at lifting the kernel of Guillemot et al. [9] showed that the approach does not directly work for approximation, and new insight seems to be necessary.

Finally, somewhat surprisingly we show that the assumption that has at least two non-edges appears to be important. Suppose is a complete graph on vertices with one edge removed. While -free Edge Completion is trivially polynomial-time solvable, due to each obstacle having only one way to be destroyed, the complexity of -free Edge Deletion turns out to be much more interesting. Namely, we show that it is tightly connected to the complexity of Min Horn Deletion, which apparently is one of the remaining open cases in the classification of the approximation complexity of CSP problems of Khanna et al. [11]. Hence, the following theorem shows that the case of being a complete graph without an edge may be an interesting outlier in the whole complexity picture.

###### Theorem 3.

For any , the -free Edge Deletion problem is Min Horn Deletion-complete with respect to A-reductions.

The exact meaning of Min Horn Deletion-completeness, A-reductions and other definitions related to the hardness of approximation for CSP problems are explained in Section 4. A direct consequence of Theorem 3 and the work of Khanna et al. [11] is that -free Edge Deletion does not admit a -approximation algorithm working in polynomial time, for any . Moreover, Theorem 3 implies that -free Edge Deletion is poly-APX-hard if and only if each Min Horn Deletion-complete problem is poly-APX-hard, the latter being an intriguing open problem left by Khanna et al. [11] in their study of approximability of CSPs.

While there is no direct connection between the existence of a approximation and poly-APX-hardness, we still believe that our reduction corroborates the hardness of resolving approximation question of -free Edge Deletion in terms of optimum value. Intuitively, showing poly-APX-hardness should be easier than refuting approximation. Below we state formally what our reduction actually implies.

###### Corollary 4.

Let . Then it is -hard to approximate the -free Edge Deletion problem within factor for any , where is the number of edges in a given graph.

###### Corollary 5.

Let . Then the -free Edge Deletion problem admits an -approximation for all , if and only if each Min Horn Deletion-complete problem admits an -approximation for all .

#### Our techniques.

To prove our main result, Theorem 1, we employ the following strategy. We first consider the sandwich problem defined as follows: in Sandwich -Free Edge Deletion we are given a graph together with a subset of undeletable edges, and the question is whether there exists a subset of deletable edges for which is -free. Note that the sandwich problem differs from the standard -free Edge Deletion problem in two aspects: first, some edges are forbidden to be deleted, and, second, it is a decision problem about the existence of any solution—we do not impose any constraint on its size. For completion, the sandwich problem is defined similarly: we have unfillable non-edges, i.e., non-edges that are forbidden to be added in the solution.

The crux of the approach is to prove that Sandwich -Free Edge Deletion is actually NP-hard under the given assumptions on . The next step is to reduce from the sandwich problem to the standard optimization variant. This is done by adding gadgets that emulate undeletable edges by introducing a large approximation gap, as follows. For each undeletable edge , attach a large number of copies of to , so that each copy becomes an induced -subgraph if gets deleted. Then any solution that deletes the undeletable edge must have a very large cost, due to all the disjoint copies of that appear after the removal of . The assumption that is -connected is very useful for showing that the constructions do not introduce any additional, unwanted copies of in the graph.

The approach for completion problems is similar. To prove Theorem 2 that concerns paths and cycles, we give problem-specific constructions using the same approach. Some of them are based on previous ETH-hardness proofs for the problems, given by Drange et al. [7].

As far as Theorem 3 is concerned, we employ a similar reduction strategy, but instead of starting from 3SAT, we start from a carefully selected MinOnes() problem: the problem of optimizing the number of ones in a satisfying assignment to a boolean formula that uses only constraints from some fixed family . In particular, the constraint family needs to be rich enough to be Min Horn Deletion-hard, while at the same time it needs to restrictive enough so that it can be expressed in the language of -free Edge Deletion.

Our constructions are inspired by the rich toolbox of hardness proofs for kernelization and fixed-parameter algorithms for edge modification problems [1, 4, 5, 7, 12, 9]. In particular, the idea of considering sandwich problems can be traced back to the work of Cai and Cai [4, 5], who use the term quarantine for the optimization variants of edge modification problems with undeletable edges and non-fillable non-edges. Quarantined problems serve a technical, auxiliary role in the work of Cai and Cai [4, 5]: one first proves hardness of the quarantined problem, and then lifts the quarantine by attaching gadgets, similarly as we do.

However, we would like to point out the new challenges that appear in the approximation setting. Most importantly, the vast majority of previous reductions heavily use budget constraints (i.e. the fact that the solution is stipulated to be of size at most ) to argue the correctness; this includes the general results of Cai and Cai [4, 5]. In our setting, we cannot use arguments about the tightness of the budget, because we need to introduce a large approximation gap at the end of the construction. The usage of the sandwich problems without any budget constraints is precisely the way we overcome this difficulty. Thus, most of the old reductions do not work directly in our setting, but of course some technical constructions and ideas can be salvaged.

#### Outline.

In Section 2 we introduce terminology and recall the most important facts from the previous works. Section 3 is devoted to the proof of our main result, Theorem 1. However, as the proof for -free Edge Completion is similar to the proof for -free Edge Deletion, in Section 3 we present only the proof for -free Edge Deletion, while the proof for -free Edge Completion is postponed to Section A.1. In Section 4 we discuss the proof of Theorem 3. Section 5 contains the discussion of Theorem 2, which is largely deferred to the appendix. Concluding remarks and prospects on future work are in Section 6.

## 2 Preliminaries

### 2.1 Basic graph definitions

We use standard graph notation. For a graph by and we denote the set of vertices and edges of , respectively. Throughout the paper we consider simple graphs only, i.e., there are no self-loops nor parallel edges. We use to denote the complete graph on vertices. By () we denote the path (cycle) with exactly vertices. By we denote the complement of , i.e., a graph on the same vertex set, where two distinct vertices are adjacent if and only if they were not adjacent in . We say that a graph is H-free, if does not contain as an induced subgraph.

We define a graph to be -vertex-connected if has at least vertices, and removing any set of at most two vertices causes to stay connected. For brevity, we call such graphs -connected.

### 2.2 Problems and approximation algorithms

In the decision version the -free Edge Deletion (Completion) problem, for a given graph and an integer , one is to decide whether it is possible to delete (add) at most edges from (to) to make it -free. In particular, we consider the -Free Deletion (Completion) problem, and call it House-Free Deletion (Completion). However, in the optimization variant of -free Edge Deletion (Completion) the value of is not given and the goal is to find a minimum size solution. It will be clear from the context whether we refer to a decision or optimization variant.

In the Sandwich -Free Edge Deletion (Completion) problem we are given a graph together with a subset of undeletable edges. The question is whether there exists a subset of deletable edges for which is -free. Note that it is a decision problem, where we ask about existence of any solution, i.e., we do not impose any constraint on the solution size.

Let be a fixed non-decreasing function on positive integers. An -factor approximation algorithm for a minimization problem is an algorithm that finds a solution of size at most , where is the size of an optimal solution for a given instance of .

### 2.3 Satisfiability and Exponential Time Hypothesis

We employ the standard notation related to satisfiability problems. A 3CNF formula is a conjunction of clauses, where a clause is a disjunction of at most three literals. The 3SAT problem asks, for a given formula , whether there is a satisfying assignment to .

The Exponential Time Hypothesis (ETH), introduced by Impagliazzo, Paturi and Zane [10] is now an established tool used for proving conditional lower bounds in the parameterized complexity area (see [13] for a survey on ETH-based lower bounds).

###### Hypothesis 6 (Exponential Time Hypothesis (ETH) [10]).

There is no time algorithm for 3SAT, where is the number of variables of the input formula.

The main consequence of the Sparsification Lemma of [10] is the following theorem: there is no subexponential algorithm for 3SAT even in terms of the number of clauses of the formula.

###### Theorem 7 ([10]).

Unless ETH fails, there is no time algorithm for 3SAT, where , are the number of variables, and clauses, respectively.

## 3 Hardness for 3-connected H

In this section we present the proof of Theorem 1 for -free Edge Deletion, while a similar proof for -free Edge Completion is deferred to Section A.1.

### 3.1 Deletion problems

We start with proving hardness of the sandwich problem.

###### Lemma 8.

Let be a -connected graph with at least non-edges. There is a polynomial-time reduction, which given an instance of 3SAT with variables and clauses, creates an equivalent instance of Sandwich -free Edge Deletion with edges. Consequently, Sandwich -free Edge Deletion is -hard for such graphs .

###### Proof.

Let be the given formula in 3CNF, and let and be the sets of variables and clauses of . By standard modifications of the formula, we may assume that each clause contains exactly three literals of pairwise different variables. We construct an instance of Sandwich -free Edge Deletion as follows. The graph is created from three types of gadgets: a clause gadget, a variable gadget, and a connector gadget. They are depicted in Figure 4, where presented edges are deletable, and all others are undeletable.

We first explain constructions of the gadgets, and then discuss connections between them. For each variable , we create a variable gadget , which is the graph with two added edges and in place of any two non-edges of . In the graph , all edges are marked as undeletable except and . Intuitively, deletion of the edge or mimics an assignment of the corresponding literal to true. The variable gadget forbids simultaneous assignments of both literals to true. If we delete both edges and , we get an induced subgraph in which we cannot delete any edge.

Each clause has the corresponding clause gadget , which is a copy of the graph . As is -connected, it has at least edges. We pick arbitrarily three edges of and label them by . We mark all others edges as undeletable. In order to make the clause gadget -free, we have to delete at least one edge from (note that some of the three distinguished edges might potentially share an endpoint). Intuitively, deletion of the edge labeled by corresponds to assigning value true to literal .

The third type of gadgets is the connector gadget. The connector gadget is a copy of the graph , with one added edge in place of any non-edge of . We label this edge as . In , there also exists another edge that does not share any of its endpoints with . To see this, for the sake of contradiction suppose that every edge of is incident to one of the endpoints of . If has at least two vertices other than these endpoints, then the endpoints of form a vertex cut of size separating them, a contradiction with -connectedness of . Otherwise has only one vertex other than the endpoints of , so has at most vertices; again, a contradiction with the -connectedness of , as we assume to have at least non-edges. We select any edge in that does not share endpoints with , and we label it as . Edge is made deletable, and all other edges of are made undeletable. Note that deletion of the edge creates an induced subgraph , and then we have to delete in order to destroy this subgraph.

Knowing the structure of all gadgets, we can proceed with the main construction of our reduction.

Given a formula , for each clause and variable , we create the clause gadget and the variable gadget , respectively. Moreover, for each literal belonging to the clause , we create a chain consisting of copies of the connector gadget, where . This chain is constructed in the following way: the edge of is identified with the edge of , for . We also identify the edge in the subgraph with the edge in the variable gadget of the variable of . Moreover, the edge in the subgraph is identified with the edge from the clause gadget . We use those chains to not allow the copy of to be shared by any two gadgets, and we will prove it in the claim below.

Clearly, the constructed graph has at most edges.

###### Claim 9.

If is a YES instance, then is satisfiable.

###### Proof.

Take any solution to the instance . Note that in each clause gadget we must delete at least one edge. We set the literals corresponding to the deleted edges to true, thus satisfying every clause. We prove now that for each variable we have not set both literals and to true, so that we can find a true/false assignment to the variables that sets the literals accordingly. Deletion of an edge in the clause gadget propagates deletions up to the variable gadget via the chain of connector gadgets. This happens because the deletion of in forces us to delete the in , which is in , so we are forced to delete in , and so on. Following the chain of connector gadgets, it is easy to see that the edge must be deleted in the corresponding variable gadget. As the solution to the instance cannot delete both edges and in any variable gadget at the same time, we obtain that there are no variables with both of its literals set to true.

###### Claim 10.

If is satisfiable, then is a YES instance.

###### Proof.

Consider a true/false assignment that satisfies the formula and delete all edges in all clause gadgets that correspond to literals taking value true. Propagate deletions to all the connector and variable gadgets, as in the proof of Claim 9. It remains to prove that the obtained graph is indeed an -free graph. By counting the number of edges in each gadgets, it follows that after the deletions, all gadgets become not isomorphic to : in every variable gadget, we deleted exactly one edge, in every clause gadget, we deleted at least one edge, and in each connector gadget we deleted zero or two edges. So if the obtained graph contains an induced subgraph of , then is distributed across several gadgets. However, this is also not possible for the following reason.

For the sake of contradiction, suppose after the deletions there is an induced copy of the graph . Since is connected and is distributed among more then one gadget, there have to be two different gadgets that share a vertex, for which contains both some vertex , and some vertex . Since is -connected, there are internally vertex-disjoint paths in that lead from to . But every two gadgets share at most two common vertices, so at least one of these paths, say , avoids . Since the path avoids , from the construction of it easily follows that such path contains at least one vertex of some variable gadget and at least one vertex of some clause gadget. However, the distance between and in each connector gadget is at least , so the distance between any variable gadget and any clause gadget is at least . But the path is entirely contained in , thus its length is at most , a contradiction.

Claims 9 and 10 ensure that the output instance is equivalent to the input instance of 3SAT, so we are done. ∎

Now, we show how to reduce Sandwich -free Edge Deletion to the optimization variant of -free Edge Deletion. Note that we only require to have at least one non-edge; this is because we will reuse this lemma in the next section.

###### Lemma 11.

Let be a -connected graph with at least one non-edge, and be a polynomial with , for all positive . Then there is a polynomial-time reduction which, given an instance of Sandwich -free Edge Deletion, creates an instance of -free Edge Deletion, such that:

• is the number of deletable edges in ;

• has edges;

• If is a YES instance, then is a YES instance;

• If is a NO instance, then is a NO instance.

###### Proof.

We create in the following way. For each undeletable edge , we add copies of the graph , . In each copy, we choose any non-edge and identify the vertex with , and with . The construction is presented in Figure 5.

Note that if we delete the edge in , we also must delete at least one edge in every . Hence, at least edges will be deleted in such a situation. With this observation in mind, we proceed to the proof of the correctness.

###### Claim 12.

If is a YES instance, then is a YES instance.

###### Proof.

Let be a subset deletable edges, such that is -free. Obviously , because there are deletable edges in in total. We will prove that is also -free, which implies that is a YES instance.

Let us assume otherwise, that there is an induced copy of in . Since is -free, we have that has to contain at least one vertex of . Say that contains some vertex of , for some undeletable edge and some index . The edge is undeletable in , so it is not included in . Consequently, the subgraph of induced by contains one more edge than , so it is not isomorphic to . We conclude that must contain some vertex that lies outside of . Since is -connected, there are internally vertex-disjoint paths between and in . However, in , the set is a vertex cut of size that separates and . This is a contradiction, so is indeed -free.

###### Claim 13.

If is a NO instance, then is a NO instance.

###### Proof.

For the sake of contradiction, suppose there is a set of at most edges of , such that is -free. Note that, has to contain at least one undeletable edge , as otherwise would be a solution to . But then has to contain at least more edges inside gadgets , for , which is a contradiction with .

Claims 12 and 13 ensure the correctness of the reduction, and hence we are done. ∎

By composing the reductions of Lemmas 8 and 11, we can deduce the part of Theorem 1 concerning deletion problems. Indeed, suppose -free Edge Deletion admitted a polynomial-time -factor approximation algorithm, for some polynomial . Take any instance of 3SAT, and apply first the reduction of Lemma 8, and then the reduction of Lemma 11 for polynomial . Finally, observe that the application of the hypothetical approximation algorithm for -free Edge Deletion to the resulting instance would resolve whether the optimum value is at most or at least , which, by Lemma 11, resolves whether the input instance of 3SAT is satisfiable. The subexponential hardness of approximation under ETH follows from the same reasoning and the observation that the value of in the output instance is bounded linearly in the size of the input formula.

## 4 Connections with Min Horn Deletion

In this section we prove Theorem 3. First, we need to introduce some definitions and notation regarding Min Horn Deletion hardness and completeness.

Khanna et al. [11] attempted to establish a full classification of approximability of boolean constraint satisfaction problems. In particular, many problems have been classified as APX-complete or poly-APX-complete. Even though some cases remained unresolved, Khanna et al. [11] grouped them into classes, such that all problems from the same class are equivalent (with respect to appropriately defined reductions) to a particular representative problem. One such representative problem is Min Horn Deletion, defined as follows: Given is a boolean formula in CNF that contains only unary clauses, and clauses with three literals out of which exactly one is negative. The problem asks for minimizing the number of ones in a satisfying assignment for .

We are not going to operate on instances of Min Horn Deletion directly, so the definition above is given only in order to complete the picture for the reader. Instead, we will rely on the approximation hardness results exhibited by Khanna et al. [11], which relate the approximability of various boolean CSPs to Min Horn Deletion. In particular, it is known that Min Horn Deletion does not admit a approximation algorithm, unless , where is the number of variables in the instance. On the other hand, it is an open problem whether any Min Horn Deletion-complete problem (under -reductions, defined below) is actually poly-APX-complete.

###### Definition 14 (A-reducibility, Definition 2.6 of [11]).

A combinatorial optimization problem is said to be an NPO problem if instances and solutions can be recognized in polynomial time, solutions are polynomially-bounded in the input size, and the objective function can be computed in polynomial time from an instance and a solution.

An NPO problem is said to be A-reducible to an NPO problem , denoted , if there are two polynomial-time computable functions and and a constant , such that:

1. For any instance of , is an instance of .

2. For any instance of and any feasible solution for , is a feasible solution for .

3. For any instance of and any , if is an -approximate solution for , then is an -approximate solution for .

Intuitively, -reductions preserve approximability problems up to a constant factor (or higher). As a source of Min Horn Deletion-hardness we will use the MinOnes() problem, defined below, for a particular choice of the family of constraints .

In the MinOnes() problem, we are given a ground set of boolean variables together with a set of boolean constraints. Each constraint is taken from a specified family , and is applied to some tuple of variables from . The goal of the problem is to find an assignment satisfying all the constraints, while minimizing the number of variables set to one. Note that the family is considered a part of the problem definition, not part of the input. In order to use known results for the MinOnes() problem we need to define some properties of boolean constraints.

• A boolean constraint is called weakly positive if it can be expressed using a CNF formula that has at most one negated variable in each clause.

• A boolean constraint is -valid if the all-zeroes assignment satisfies it.

• A boolean constraint is IHS- if it can be expressed using a CNF formula in which the clauses are all of one of the following types: for some positive integer , or , or . IHS- constraints are defined analogously, with every literal being replaced by its complement.

The definition can be naturally extended to families of constraints, e.g., a family of constraints is weakly positive if all its constraints are weakly positive. We say that a family of constraints is IHS- if it is either IHS- or IHS- (or both). The following result was proved by Khanna et al. [11].

###### Theorem 15 (Lemmas 8.7 and 8.14 from [11]).

If a family of constraints is weakly positive, but it is neither -valid nor IHS- for any constant , then the problem MinOnes() is Min Horn Deletion-complete under -reductions; that is, there is an -reduction from Min Horn Deletion to MinOnes() and an -reduction from MinOnes() to Min Horn Deletion. Consequently, it is NP-hard to approximate MinOnes() within factor for any , where is the number of variables in the given instance.

Our strategy for the proof of Theorem 3 is as follows. In Section 4.1 we show a reduction from MinOnes() to a properly defined quarantined version of -free Edge Deletion. Next, in Section 4.2 we show a reduction which removes the quarantine. Finally, in Section 4.3 we conclude the proof of Theorem 3 and show the completeness with respect to -reductions.

Note that having Theorem 3, we can immediately infer Corollaries 4,5 using Theorem 15 and the definition of an -reduction.

### 4.1 From MinOnes(F) to Quarantined H-free Edge Deletion

In the Quarantined -free Edge Deletion problem we are given a graph , some edges of which are marked as undeletable. Quarantined -free Edge Deletion is an optimization problem, where the goal is to obtain an -free graph by removing the minimum number of deletable edges.

Next, we define the family of constraints that will be used in the MinOnes() problem.

###### Definition 16.

We define the following constraints:

• a constraint , which is equal to zero if and only if exactly one of the variables is set to ;

• a constraint .

The family of constraints is defined as .

A direct check, presented below, verifies that has the properties needed to claim, using Theorem 15, thet MinOnes() is Min Horn Deletion-hard.

###### Lemma 17.

The family of constraints is weakly positive, and at the same time it is neither -valid, nor IHS- for any .

###### Proof.

Note that is weakly positive since . Constraint is clearly weakly positive by definition. As is not -valid, we have that is not -valid either.

We prove now that is not IHS- for any . First, observe that any CNF formula expressing cannot contain a clause with only positive literals, as such a clause would not be satisfied by the assignment , which in turn satisfies . Similarly, no clause can have only negative literals. Due to the definition of IHS-, the only remaining case is a -clause with one positive and one negative literal. Without loss of generality, consider a clause . Observe, that it is not satisfied by the assignment , which however satisfies . Therefore , and consequently , is not IHS- for any B. ∎

Consequently, Theorem 15 and Lemma 17 together imply that MinOnes() is Min Horn Deletion-hard under -reductions. We now give our main reduction, from MinOnes() to Quarantined -free Edge Deletion.

###### Lemma 18.

Let . There is a polynomial-time computable transformation which, given an instance of the MinOnes() problem, outputs an instance of the Quarantined -free Edge Deletion problem, such that:

• if admits a satisfying assignment with ones, then there is a solution of cost for the instance ,

• if admits a solution of cost , then there is a satisfying assignment with ones for the instance ,

where and is the number of variables in .

###### Proof.

First, we show how to transform an instance (with a formula ) of MinOnes() into an instance (with a graph ) of Quarantined -free Edge Deletion. Given an instance , for any constraint we create a separate clique , which will be called the constraint clique. We arbitrarily choose three edges in the clique and label them . Mark all edges as undeletable except edges labelled by . Moreover, for each variable we additionally create a clique (called further the variable clique), and mark all edges in the clique as undeletable except two edges, which we label by . The edges are selected arbitrarily, however we require that they do not share common endpoints.

Now we connect the variable cliques with the constraint cliques. For each variable and a constraint of the instance which contains among its arguments, we add three cliques, as shown in Figure 6, such that the following properties are satisfied:

• The first added clique shares with the variable clique of only the edge .

• The second added clique shares one deletable edge with the first clique and a different deletable edge with the third clique. Label both these deletable edges by .

• The third added clique shares with the clique corresponding to the constraint only the edge labelled (in the constraint clique) by .

All the other edges of the introduced cliques, not mentioned above, are marked as undeletable. Note that each of the introduced cliques shares two edges with two different cliques. We may perform this construction so that these two edges never share endpoints (as depicted Figure 6), and hence we will assume this property.

Denote by the number of occurrences of the variable in all -type constraints. Note that, by removing superfluous copies of the same constraint, we can assume that all -type constraints are pairwise different, so in particular there is at most of them. Also, for any variable we have .

Next, for each variable we add or cliques that share the deletable edge from the variable clique of , and are otherwise disjoint. Moreover, in each such clique we make one more edge deletable; we label it by . We add cliques if the formula does contain the clause , and cliques otherwise.

Finally, if there is a clause in the instance , then we delete the edge labelled by in the corresponding variable clique.

Observe that in the constructed instance of Quarantined -free Edge Deletion, among all the edges labelled by , where is any variable, we have to delete either none, or all of them. This is because the deletion of any of them forces the deletion of all the others due to the appearance of induced copies of in the graph. Moreover, if the edge is not present due to the existence of constraint in , then all of them have to be deleted.

###### Claim 19.

If there is a satisfying assignment with ones for the instance , then it is possible to delete edges in in order to make it a -free graph.

###### Proof.

It is enough to delete all edges labelled by for all variables that are set to in the satisfying assignment; the number of such edges is exactly . Let us prove the statement. Suppose the obtained graph is not -free. Let be an induced subgraph isomorphic to . Note that for the graph is -connected. Moreover, even after deletion of two arbitrary vertices in , there are no two vertices at distance larger than two. Consequently, a direct check shows that the assumed subgraph must stay completely in one of the cliques corresponding to a constraint or to a variable, or in one of the cliques connecting a variable clique with a constraint clique. Obviously, cannot be contained in a variable clique or a connection clique, as in such cliques either all edges are present, or two edges are missing. This means that must stay in a constraint clique, so exactly one of the edges of this constraint clique is deleted. However, this is equivalent with the corresponding constraint being not satisfied under the considered assignment; this is a contradiction.

###### Claim 20.

If admits a solution of cost , then there is a satisfying assignment for the instance  with ones.

###### Proof.

Take any solution for the output instance . As mentioned earlier, in any solution for , for any variable either all edges labeled by are deleted or none of them is deleted. The number of such edges for one variable is equal to . We set a variable to if and only if the corresponding edges are deleted in the considered solution for . All clauses of the form will be satisfied, since in the construction of we delete if the clause is present in . All -type constraints will be satisfied as well, as otherwise in the clique corresponding to an unsatisfied constraint only one edge would be deleted and, hence, the graph would not be -free.

The correctness of the transformation follows from Claims 19 and 20; hence the proof of Lemma 18 is complete. ∎

### 4.2 Lifting the quarantine

In the following lemma we show how to reduce an instance of the quarantined problem to its regular version, using the same approach as in the proof of Lemma 11.

###### Lemma 21.

Let . There is a polynomial-time reduction which, given an instance of Quarantined -free Edge Deletion with edges, outputs an instance of -free Edge Deletion such that:

• has vertices and edges.

• If there is a solution of size for the instance , then there is a solution of size for the instance .

• If there is a solution of size for the instance , then there is a solution of size for the instance .

###### Proof.

We apply the reduction described in the proof of Lemma 11 for and . Now we verify that has the claimed properties. The bound on the size of follows directly from the size bound given by Lemma 11.

Suppose first that has some solution of size . In the proof of Lemma 11 we argued that the same solution also works for the instance (see the proof of Claim 12). Hence, also has a solution of size .

Suppose now that has a solution of some size . In the proof of Claim 13 we argued that does not delete any of the undeletable edges of , because this would require deleting at least more edges in the attached gadgets. Hence, is a set of size at most , whose deletion turns into an -free graph, due to being an induced subgraph of . Hence, has some solution of size at most . ∎

The composition of the reductions of Lemmas 18 and 21 gives an -reduction (for ) from a Min Horn Deletion-hard problem MinOnes(), yielding the hardness part of Theorem 3. Indeed, given an instance of MinOnes() we can transform it into an instance of Quarantined -free Edge Deletion using Lemma 18, which in turn we can further transform into an instance of -free Edge Deletion using Lemma 21. Given any feasible solution for we check whether . If this is the case, we translate back the solution into a solution for (using Lemma 21) and then into a solution for the initial instance (using Lemma 18). On the other hand, if , then for the initial instance we may just take a trivial solution being an assignment setting all the variables to one.

### 4.3 Completeness

To finish the proof of Theorem 3 it remains to show a reduction in the other direction: from -free Edge Deletion to Min Horn Deletion. We achieve this goal by presenting an -reduction from the -free Edge Deletion problem to another variant of MinOnes(), which is Min Horn Deletion-complete.

###### Definition 22.

Let , and let . We define family of constraints as follows:

• if and only if exactly one of the variables takes value 1;

• if and only if all the variables take value .

The proof of the following lemma is a technical check that is essentially the same as the proof of Lemma 17. Hence, we leave it to the reader.

###### Lemma 23.

For each , the set of constraints is weakly positive, and at the same time it is neither -valid, nor IHS- for any .

Therefore, by Theorem 15 we know that MinOnes() is Min Horn Deletion-complete and it suffices to present an -reduction from -free Edge Deletion to MinOnes().

###### Lemma 24.

There is a polynomial-time algorithm, which given an instance of -free Edge Deletion produces an instance of MinOnes(), such that it is possible to remove exactly edges in to make it -free if and only if one can find a satisfying assignment for that sets exactly variables to .

###### Proof.

Consider an instance of the -free Edge Deletion problem. We enumerate all the edges in the graph as , and to each edge we assign a fresh boolean variable . For any induced subgraph isomorphic to we list all its edges and create a corresponding constraint . For any induced clique containing vertices and edges , we create a constraint . The output instance of MinOnes() is obtained by taking to be the variable set, and putting all the constraints constructed above.

Note that if we delete some edges in the graph , then an induced copy of the graph can be obtained only on vertices that originally were inducing or . The constraints in the constructed instance guarantee that in each induced subgraph at least one edge from the subgraph must be deleted, and in each induced subgraph either at least two edges should be deleted, or none of the edges should be deleted. So, for any , the graph , where , is -free if and only if the assignment iff satisfies . This equivalence of solution sets immediately proves the lemma. ∎

As discussed earlier, Lemma 24 gives an -reduction -free Edge Deletion to MinOnes(), which is Min Horn Deletion-complete, thereby proving that -free Edge Deletion is -reducible to Min Horn Deletion. This concludes the proof of Theorem 3.

## 5 Specific constructions for short paths and cycles

In this section we extend the general results yielded by Theorems 11 and 42 in the direction of obtaining a full picture of the approximation complexity for being a path or a cycle. It can be easily seen that the complements of and for satisfy the preconditions of Theorems 11 and 42. Hence, by complementation, we have already established hardness of approximation for these cases. We are left with considering the edge modification problems for and for . Therefore, to complete the proof of Theorem 2, it remains to prove the following.

###### Lemma 25.

Let be equal to , , or . Then neither -free Edge Deletion nor -free Edge Completion admits a -factor approximation algorithm working in polynomial time, unless . Moreover, unless ETH fails, there is even no -factor approximation algorithm working in time , for any of these problems.

Before we proceed to the proof of the missing cases (Lemma 25), let us check that we indeed obtain a full classification for cycles, and an almost full classification for paths, as promised in the introduction. The problem -Free Edge Deletion, aka Triangle-Free Edge Deletion, admits a trivial greedy -approximation algorithm, whereas -Free Edge Deletion, aka Cluster Edge Deletion, admits a constant-factor approximation algorithm given by Natanzon [14]. The problem -Free Edge Completion has no sense, and -Free Edge Completion is polynomial-time solvable because there is only one way to destroy every obstacle. The only missing case is -Free Edge Deletion, which is equivalent to -Free Edge Completion by complementation.

The rest of this section is devoted to the proof of Lemma 25. For this, we implement the same strategy as in Theorems 11 and 42: we first prove hardness of sandwich problems by giving linear reductions from 3SAT, and then we reduce to the standard optimization variant by introducing the approximation gap. For convenience, instead of working with -Free Edge Deletion and -Free Edge Completion, we respectively consider House-Free Edge Completion and House-Free Edge Deletion, where house is the complement of : a -cycle with a triangle built on one of the edges (see Figure 7). These problems are equivalent to the ones concerning -s by complementation of the instance. Also, observe that -Free Edge Deletion and -Free Edge Completion are equivalent by complementation, and hence we consider only the former.

### 5.1 Sandwich deletion problems

We start with the hardness proof for Sandwich -Free Edge Deletion, which will serve as a template for other reductions. The structural property of the instance, described in the statement, will turn out to be useful in some further arguments.

###### Lemma 26.

There is a polynomial-time reduction which, given an instance of 3SAT with variables and clauses, constructs an equivalent instance of Sandwich -Free Edge Deletion with vertices and edges. Moreover, has the following additional property: every (not necessarily induced) subgraph of contains an undeletable edge. Consequently, Sandwich -Free Edge Deletion is -hard, even on such instances.

###### Proof.

Let be the given formula in 3CNF, and let and be the sets of variables and clauses of . By standard modifications of the formula, we may assume that each clause contains exactly three literals of pairwise different variables.

We introduce gadgets for variables, for clauses, and for connections between variable and clause gadgets. They are depicted in Figure 11, where thick edges are undeletable and dashed edges are deletable. The variable gadget , depicted on the first panel, has four named vertices , , and , which will be used to connect the copies of this gadget to the rest of the construction. The properties of the variable gadget are described in the following claim. Its proof follows by a direct check, and hence is omitted.

###### Claim 27.

There are exactly two solutions to the Sandwich -Free Edge Deletion instance . One of them, denoted , contains and does not contain , and the second, denoted , contains and does not contain .

Next, we describe the clause gadget , depicted on the second panel of Figure 11. It consists of a clique on vertices , where the cycle has deletable edges, and all the other edges are undeletable. Again, the properties of the clause gadget are described in the following claim, whose proof is omitted due to being straightforward.

###### Claim 28.

In the Sandwich -Free Edge Deletion instance there is no solution that simultaneously contains all three edges , and . However, for each , there is a solution that does not contain , but contains both the other edges from this triple.

For every variable we create a copy of the variable gadget . The copies of vertices , , and in are respectively renamed to , , and . For every clause we create a copy of the clause gadget . The copies of vertices in are respectively renamed to .

Finally, we wire the variable gadgets and clause gadgets using connector gadgets, which are just -s (depicted on the third panel of Figure 11). More precisely, whenever appears in the -th literal clause , we connect with and with using undeletable edges, where if the appearance of in is positive, and if it is negative. Note that the deletable edges and depicted in Figure 11 are always present in respective variable or clause gadgets

This concludes the construction; the constructed graph will be denoted by . Obviously has vertices and edges. It is straightforward to see that the asserted structural property of is satisfied: the subgraph spanned by deletable edges consists of disjoint paths and cycles on vertices, hence every subgraph must contain at least one undeletable edge.

We now need to verify that the obtained instance of Sandwich -Deletion has a solution if and only if the input formula is satisfiable. For this, the following claim will be useful; its proof is a straightforward check following from the fact that each vertex of a clause gadget is incident with at most one edge leading to a variable gadget, and hence we omit the proof.

###### Claim 29.

Every (not necessarily induced) in is entirely contained in one variable gadget, in one clause gadget, or forms one connector gadget.

Suppose first that is a variable assignment that satisfies . Construct a subset of deletable edges in as follows:

• For each variable , add to the solution in the variable gadget , given by Claim 27.

• For each clause , arbitrarily choose an index of any of its literal that satisfies it under ; such literal exists due to being a satisfying assignment. Then add to the solution in the clause gadget , given by Claim 28.

By Claim 29, to verify the is -free, it suffices to show that there is no induced within any variable gadget or within any clause gadget, and that one of the edges in each connector gadget is removed. The first two checks follow immediately from Claims 27 and 28. For the last check, fix some clause and variable appearing in it; we examine the connector gadget between and . Suppose that appears in the -th literal of , and assume w.l.o.g. that this appearance is positive; the second case is symmetric. If , then the edge is deleted in , and hence the in the connector gadget is destroyed. Otherwise , and hence the literal containing cannot satisfy the clause under assignment . From the construction of it follows that the edge is deleted in the gadget , and hence the in the connector gadget is also destroyed.

For the other direction, suppose that there is a subset of deletable edges in such that is -free. By Claim 27, the intersection of with the edge set of each variable gadget must be equal either to solution or to solution . Define assignment as follows: if this intersection is , and if it is . In particular, edge belongs to if and only if , and the symmetric claim holds also for . We verify that is a satisfying assignment for . Take any clause , and for the sake of contradiction suppose it is not satisfied under . By the construction of , this means that in all three connector gadgets connecting with variable gadgets of variables appearing in , the deletable edges from the variable gadgets are not included in . Since each connector gadget induces a with only two edges deletable, it follows that all three edges , , and have to be included in . However, Claim 28 asserts that there is no solution within the clause gadget that simultaneously contains all these three edges. This is a contradiction, and hence we conclude that assignment satisfies formula . ∎

We now move to the proof for Sandwich -Free Edge Deletion, which is a minor modification of the construction for Sandwich -Free Edge Deletion. For this reason, we only sketch how the construction need to be modified, and argue that the correctness proof follows the same steps.

###### Lemma 30.

There is a polynomial-time reduction which, given an instance of 3SAT with variables and clauses, constructs an equivalent instance of Sandwich -Free Edge Deletion with edges. Consequently, Sandwich -Free Edge Deletion is -hard.

###### Proof.

We perform essentially the same construction as in the proof of Lemma 26, but we replace the variable, clause and connector gadgets with -specific constructions depicted in Figure 15.

The variable gadget is depicted on the first panel of Figure 15. As before, it has four named vertices: , , , and . Again, a direct check, whose proof is omitted, yields the following.

###### Claim 31.

There are exactly two solutions to the Sandwich -Free Edge Deletion instance