Min CSP on Four Elements: Moving Beyond Submodularity

# Min CSP on Four Elements: Moving Beyond Submodularity

Peter Jonsson Department of Computer and Information Science,
Partially supported by the Swedish Research Council (VR) under grant 621-2009-4431.
Fredrik Kuivinen    Johan Thapper École polytechnique, Laboratoire d’informatique (LIX),
91128 Palaiseau Cedex, France
Supported by the LIX-Qualcomm Postdoctoral Fellowship.
###### Abstract

We report new results on the complexity of the valued constraint satisfaction problem (VCSP). Under the unique games conjecture, the approximability of finite-valued VCSP is fairly well-understood. However, there is yet no characterisation of VCSPs that can be solved exactly in polynomial time. This is unsatisfactory, since such results are interesting from a combinatorial optimisation perspective; there are deep connections with, for instance, submodular and bisubmodular minimisation. We consider the Min and Max CSP problems (i.e. where the cost functions only attain values in ) over four-element domains and identify all tractable fragments. Similar classifications were previously known for two- and three-element domains. In the process, we introduce a new class of tractable VCSPs based on a generalisation of submodularity. We also extend and modify a graph-based technique by Kolmogorov and Živný (originally introduced by Takhanov) for efficiently obtaining hardness results in our setting. This allow us to prove the result without relying on computer-assisted case analyses (which otherwise are fairly common when studying the complexity and approximability of VCSPs.) The hardness results are further simplified by the introduction of powerful reduction techniques.

Keywords: constraint satisfaction problems, combinatorial optimisation, computational complexity, submodularity

## 1 Introduction

This paper concerns the computational complexity of an optimisation problem with strong connections to the constraint satisfaction problem (CSP). An instance of the constraint satisfaction problem consists of a finite set of variables, a set of values (the domain), and a finite set of constraints. The goal is to determine whether there is an assignment of values to the variables such that all the constraints are satisfied. CSPs provide a general framework for modelling a variety of combinatorial decision problems [6, 8].

Various optimisation variations of the constraint satisfaction framework have been proposed and many of them can be seen as special cases of the valued constraint satisfaction problem (VCSP), introduced by Schiex et al. [20]. This is an optimisation problem which is general enough to express such diverse problems as Max CSP, where the goal is to maximise the number of satisfied constraints, and the minimum cost homomorphism problem (Min HOM), where all constraints must be satisfied, but each variable-value tuple in the assignment is given an independent cost. To accomplish this, instances of the VCSP assign costs (possibly infinite) to individual tuples of the constraints. It is then convenient to replace relations by cost functions, i.e. functions from tuples of the domain to some set of costs. This set of costs can be relatively general, but much is captured by using , where denotes the set of non-negative rational numbers. We arrive at the following formal definition.

###### Definition 1

Let be a finite domain, and let be a set of functions . By VCSP() we denote the following minimisation problem:

Instance:

A set of variables , and a sum , where , , and is a list of variables from .

Solution:

A function .

Measure:

, where is the list of elements from obtained by applying component-wise to .

The set is often referred to as the constraint language. We will use as our parameter throughout the paper. For instance, when we say that a class of VCSPs is polynomial-time solvable, then we mean that VCSP is polynomial-time solvable for every . Finite-valued functions, i.e. functions with a range in , are sometimes called soft constraints. A prominent example is given by functions with a range in ; they can be used to express instances of the well-known Min CSP and Max CSP problems (which, for instance, include Max -Cut, Max -Sat, and Nearest Codeword as subproblems). On the other side we have crisp constraints which represent the standard type of CSP constraints. These can be expressed by cost functions taking values in .

A systematic study of the computational complexity of the VCSP was initiated by Cohen et al. [4]; for instance, they prove a complexity dichotomy for VCSP over two-element domains. This was the starting point for an intensive research effort leading to a large number of complexity results for VCSP: examples include complete classifications of conservative constraint languages (i.e. languages containing all unary cost functions) [7, 14, 13], languages on three elements [11], languages containing a single cost function [12], and arbitrary languages with cost functions [22]. We note that some of these results have been proved by computer-assisted search—something that drastically reduces the readability, and insight gained from the proofs. We also note that there is no generally accepted conjecture stating which VCSPs are polynomial-time solvable.

The picture is clearer when considering the approximability of finite-valued VCSP. Raghavendra [19] have presented algorithms for approximating any finite-valued VCSP. These algorithms achieve an optimal approximation ratio for the constraint languages that cannot be solved to optimality in polynomial time, given that the unique games conjecture (UGC) is true. For the constraint languages that can be solved to optimality, one gets a PTAS from these algorithms. Furthermore, no characterisation of the set of constraint languages that can be solved to optimality follows from Raghavendra’s result. Thus, Raghavendra’s result does not imply the complexity results discussed above (not even conditionally under the UGC).

The goal of this paper is to prove a dichotomy result for VCSP with cost functions over four-element domains: we show that every such problem is either solvable in polynomial time or NP-hard. Such a dichotomy result is not known for CSPs on four-element domains (and, consequently, not for unrestricted VCSPs on four-element domains). Our result proves that, in contrast to the two-element, three-element, and conservative case, submodularity is not the only source of tractability. In order to outline the proof, let denote a constraint language with cost functions over a four-element domain . We will need two tractability results in our classification. The first one is well-known: if every function in is submodular on a chain (i.e. a total ordering of ), then VCSP is solvable in polynomial time. The second result is new and can be found in Section 3: we introduce 1-defect chain multimorphisms and prove that if has such a multimorphism, then VCSP is tractable. A multimorphism is, loosely speaking, a pair of functions such that satisfies certain invariance properties under them. The algorithm we present is based on a combination of submodular and bisubmodular minimisation [9, 17, 21].

The hardness part of the proof consists of four parts (Sections 47). We begin by introducing some tools in Section 4 and 5. Section 4 concerns the problem of adding (crisp) constant unary relations to without changing the computational complexity of the resulting problem. The main tool for doing this is using the concept of indicator problems introduced by Jeavons et al. [10] (see also Cohen et al. [3]). Section 5 introduces a graph construction for studying . In principle, this graph provides information about the complexity of VCSP based on the two-element sublanguages of . Similar graphs has been used repeatedly in the study of VCSP, cf. [1, 14, 22]. Equipped with these tools, we determine the complexity of VCSP over a four-element domain in Section 6. The graph introduced in Section 5 allows us to prove that, when is a core (cf. Section 4), VCSP is polynomial-time solvable if and only if is submodular on a chain or has a 1-defect chain multimorphism (Theorem 6.1). Some proofs of intermediate results are deferred to Section 7.

## 2 Preliminaries

Throughout this paper, we will assume that is a finite set of {0,1}-valued functions. By Min CSP we denote the problem VCSP. It turns out to be convenient to introduce a generalisation of this problem in which we allow additional constraints on the solutions. From a VCSP perspective, this means that we allow crisp as well as -valued cost functions. To make the distinction clear, and since we will not be using any mixed cost functions, we represent the crisp constraints with relations instead of -valued cost functions.

###### Definition 2

Let be a set of -valued functions on a domain , and let be a set of finitary relations on . By Min CSP() we denote the following minimisation problem:

Instance:

A Min CSP-instance , and a finite set of constraint applications , where and is a matching list of variables from .

Solution:

A solution to such that for all .

Measure:

The measure of as a solution to .

We will generally omit the parenthesis surrounding singletons in unary relations, as in the following definition: let be the set of constant unary relations over .

### 2.1 Expressive power and weighted pp-definitions

It is often possible to enrich a set of functions without changing the computational complexity of Min CSP. In this paper, we will make use two distinct, but related notions aimed at this purpose.

###### Definition 3

Let be an instance of Min CSP, and let be a sequence of distinct variables from . Let

 πx\rm Optsol(I)={(σ(x1),…,σ(xs))∣\rm\,σ is an optimal solution to I\,},

i.e. the projection of the set of optimal solutions onto . We say that such a relation has a weighted pp-definition in . Let denote the set of relations which have a weighted pp-definition in .

For an instance of Min CSP, we define to be the optimal value of a solution to , and to be undefined if no solution exists. The following definition is a variation of the concept of the expressive power of a valued constraint language, see for example Cohen et al. [4].

###### Definition 4

Let be an instance of Min CSP, and let be a sequence of distinct variables from . Define the function by letting . We say that is expressible over . Let denote the set of total functions expressible over .

###### Proposition 1

Let and be finite sets. Then, Min CSP is polynomial-time reducible to Min CSP.

###### Proof

The reduction from Min CSP to Min CSP is a special case of Theorem 3.4 in [4]. We allow weights as a part of our instances, but this makes no essential difference.

For the remaining part, we will assume that contains a single relation . The case when , for can be handled by eliminating one relation at a time using the same argument. Let be an instance of Min CSP. For each application , , we create a copy of in which the variables have been replaced by . We now create an instance of Min CSP as follows: let , , and let the set of constraint applications of consist of all applications from apart from those involving the relation , and all applications from , . We will choose large enough, so that if is satisfiable, then in any optimal solution to , the restriction of to the set is forced to be an optimal solution to the instance . It then follows that , so we can recover an optimal solution to from . The value of is chosen as follows: if all solutions to have the same measure, we can let . Otherwise, let be the minimal difference in measure between a sub-optimal solution, and an optimal solution to . Assume that , and let . Note that if is any solution to the instance obtained from by removing all constraint applications, then . We can then let ; the representation size of is linearly bounded in the size of the instance . It is easy to check that if is unsatisfiable, or if , then is unsatisfiable. Otherwise . ∎

### 2.2 Multimorphisms and submodularity

We now turn our attention to multimorphisms and tractable minimisation problems. Let be a finite set. Let be a function, and let , with components . Then, we let denote the -tuple .

A (binary) multimorphism of is a pair of functions such that for any , and matching tuples and ,

 h(f(x,y))+h(g(x,y))≤h(x)+h(y). (1)

The concept of multimorphisms was introduced by Cohen et al. [4] as an extension of the notion of polymorphisms to the analysis of the VCSP problem.

###### Definition 5 (Multimorphism Function Minimisation)

Let be a finite set of triples , where is a finite set and are functions mapping to . MFM is a minimisation problem with

Instance:

A positive integer , a function , and a function where . Furthermore,

 h(x)+h(y)≥ h(fj(1)(x1,y1),fj(2)(x2,y2),…,fj(n)(xn,yn)) + h(gj(1)(x1,y1),gj(2)(x2,y2),…,gj(n)(xn,yn))

for all . The function is given to the algorithm as an oracle, i.e., for any we can query the oracle to obtain in unit time.

Solution:

A tuple .

Measure:

The value of .

For a finite set we say that MFM is oracle-tractable if it can be solved in time for some constant . It is not hard to see that if is a multimorphism of , and MFM is oracle-tractable, then Min CSP is tractable.

We now give two examples of oracle-tractable problems. A partial order on is called a lattice if every pair of elements has a greatest lower bound (meet) and a least upper bound (join). A chain on is a lattice which is also a total order.

For , let be a lattice on . The product lattice is defined on the set by extending the meet and join component-wise: for and , let , and let .

A function is called submodular on the lattice if

 f(a∧b)+f(a∨b)≤f(a)+f(b)

for all . A set of functions is said to be submodular on if every function in is submodular on . This is equivalent to being a multimorphism of . It follows from known algorithms for submodular function minimisation that MFM is oracle-tractable for any finite set of finite distributive lattices (e.g. chains) [9, 21].

The second example is strongly related to submodularity, but here we use a partial order that is not a lattice to define the multimorphism. Let , and define the functions by letting , if , and otherwise. We say that a function is bisubmodular if has the multimorphism . It is possible to minimise a -ary bisubmodular function in time polynomial in , provided that evaluating on a tuple is a primitive operation [17].

## 3 A New Tractable Class

In this section, we introduce a new multimorphism which ensures tractability for Min CSP (and more generally for VCSP).

###### Definition 6

Let and be two distinct elements in . Let be a partial order which relates all pairs of elements except for and . Assume that are two commutative functions satisfying the following conditions:

• If , then and .

• If , then , and .

We call a 1-defect chain (over ), and say that is the defect of . If a function has the multimorphism , then we also say that is a 1-defect chain multimorphism.

Three types of 1-defect chains are shown in Fig. 1(a–c). Note this is not an exhaustive list, e.g. for , there are 1-defect chains similar to Fig. 1(b), but with . When , type (b) is precisely the product lattice shown in Fig. 1(d). We denote this lattice by

###### Example 1

Let , and assume that is a 1-defect chain, with defect , and that . If , then and are the meet and join of , cf. Fig. 1(d). When we have the situation in Fig. 1(a), and when we have the situation in Fig. 1(c). In the two latter cases, and are given by the two following multimorphisms (rows and columns are listed in the order , e.g. ):

The proof of tractability for languages with 1-defect chain multimorphisms is inspired by Krokhin and Larose’s [15] result on maximising supermodular functions on Mal’tsev products of lattices. First we will need some notation and a general lemma on oracle-tractability of MFM problems.

For an equivalence relation on we use to denote the equivalence class containing . The relation is a congruence on , if and whenever and . We use to denote the set and to denote the function .

###### Lemma 1

Let be two functions that map to . If there is a congruence relation on such that 1) MFM is oracle-tractable; and 2) MFM is oracle-tractable, then MFM is oracle-tractable.

###### Proof

Let be the function we want to minimise. We define a new function by

 h′(z1,z2,…,zn)=minxi∈zih(x1,x2,…,xn).

It is clear that . By assumption 2 in the statement of the lemma we can compute given . To simplify the notation we let and . We will now prove that is an instance of MFM.

Let and choose and so that and . We then have

 h′(x[θ])+h′(y[θ]) =h(x′)+h(y′) (2) ≥h(f(x′,y′))+h(g(x′,y′)) (3) ≥h′(f(x′,y′)[θ])+h′(g(x′,y′)[θ]) (4) =h′(f(x,y)[θ])+h′(g(x,y)[θ])) (5) =h′(u(x[θ],y[θ]))+h′(v(x[θ],y[θ])). (6)

Here (2) follows from our choice of and , (3) follows from the fact that is an instance of MFM, (4) follows from the definition of , and finally (5) and (6) follows as is a congruence relation of and . Hence, is an instance of MFM and can be minimised in polynomial time by the first assumption in the lemma. ∎

Armed with this lemma and the oracle-tractability of submodular and bisubmodular functions described in the previous section, we can now present a new tractable class of Min CSP-problems.

###### Proposition 2

If has a 1-defect chain multimorphism, then Min CSP is tractable.

###### Proof

Assume that has a 1-defect chain multimorphism over with defect . We prove that MFM is oracle-tractable.

Assume that and are maximal elements, i.e. for all . In this case the equivalence relation with classes , , is a congruence relation of . Furthermore, MFM and MFM are oracle-tractable [17, 21]. It now follows from Lemma 1 that MFM is oracle-tractable. The same argument works for the case when and are minimal elements.

If , but and are not maximal, then we can use the congruence relation with classes and . Here, and are chains, and is a 1-defect chain of the previous type. One can show that when MFM() and MFM() are both oracle-tractable, then so is MFM. Combining this with the technique used above, we can now solve the minimisation problem. An analogous construction works in the case when , using the congruence consisting of the class and its complement. Finally, when , we can use the congruence relation with classes and . Here, , , and are all chains and thus the MFM problem for these triples is oracle-tractable [21]. ∎

We now turn to prove a different property of functions with 1-defect chain multimorphisms. It is based on the following result for submodular functions on chains, which was derived by Queyranne et al. [18] from earlier work by Topkis [23] (See also Burkard et al. [2]). This formulation is due to Deineko et al. [7]:

###### Lemma 2

A function is submodular on a chain if and only if the following holds: every binary function obtained from by replacing any given variables by any constants is submodular on this chain.

It is straightforward to extend this lemma to products of chains, such as . Here, we outline the proof of the corresponding property for arbitrary 1-defect chains, which will be needed in Section 6. We will use the following observation.

###### Definition 7

A binary operation is called a 2-semilattice if it is idempotent, commutative, and for all .

###### Proposition 3

Let be a 1-defect chain with a defect on .

1. If , then is a 2-semilattice and for .

2. If , then is a 2-semilattice and for .

3. For , we have and .

###### Proof

For , the equalities and follow from the underlying partial order. Assume instead that , and that . Since , we have that is the greatest lower bound of and , which is . We also have that is the lowest upper bound of and , which is . An analogous argument proves (2).

The first equality of (3) follows from (1) if , and the second equality follows from (2) if . At least one of and holds. If both holds, there is nothing to prove, so assume that , but . We then have and for , so the second equality of (3) also holds. The remaining case follows similarly. ∎

###### Lemma 3

A function , , has the 1-defect chain multimorphism if and only if every binary function obtained from by replacing any given variables by any constants has the multimorphism .

###### Proof

Let be the defect of . We prove the statement for the case . The other case follows analogously.

Every function obtained from by fixing a number of variables is clearly invariant under every multimorphism of .

For the opposite direction, assume that does not have the multimorphism . We want to prove that there exist vectors such that

 h(x)+h(y)

with , where denotes the Hamming distance on , i.e. the number of coordinates in which and differ.

Assume to the contrary that the result does not hold. We can then choose a function of minimal arity such that

 min{\rm\,dH(x,y)∣x and % y satisfy (???)\,}>2.

The arity of must in fact be equal to the least ; otherwise, we could obtain a function of strictly smaller arity by fixing the variables in on which and agree. This would contradict the minimality in the choice of .

We will first show that it is possible to choose and so that for all . Let so that , and let be two vectors with , satisfying (7). Now, assume that . We then have

 h(x1;x2)+h(y1;y2)

Since both and are strictly less that the arity of , we have by assumption

 h(x1;x2)+h(x1;y2)≥h(x1;f(x2,y2))+h(x1;g(x2,y2)),\rm\, and\,
 h(y1;y2)+h(x1;y2)≥h(f(x1,y1);y2)+h(g(x1,y1);y2).

By combining these inequalities, we get

 h(x1;f(x2,y2))+h(f(x1,y1);y2)+h(x1;g(x2,y2))+h(g(x1,y1);y2)


Let , , , and . By Proposition 3(3), we have

 {f(x,y),g(x,y)}={(x1;y2),(f(x1,y1);f(x2,y2))},\rm\, and\,
 {f(x′,y′),g(x′,y′)}={(x1;y2),(g(x1,y1);g(x2,y2))}.

Hence, we can rewrite the previous inequality:

 h(x)+h(y)+h(x′)+h(y′)


It follows that either the pair and , or the pair and satisfies condition (7). Furthermore, and , for all .

If instead we have vectors and satisfying (7) such that for some, but not all , then we proceed as follows. Note that implies . Without loss of generality, we may therefore assume that , with for , are such that and , possibly by first exchanging and . For these vectors, condition (7) now reads:

 h(x1;x2)+h(y1;y2)

Due to the minimality of ’s arity, we must have

 h(y1;x2)+h(y1;y2)≥h(y1;f(x2,y2))+h(y1;g(x2,y2)).

We therefore have

 h(x1;x2)+h(y1;f(x2,y2))

Let and . By Proposition 3(1), is a 2-semilattice, so we have , and thus

 (x1;f(x2;y2))=(x1;f(f(x2,y2),x2))=f((y1;f(x2,y2)),(x1;x2))=f(y,x).

Furthermore, , so

 (y1;x2)=(y1;g(f(x2,y2),x2))=g((y1;f(x2,y2)),(x1;x2))=g(y,x).

We therefore conclude that

 h(x)+h(y)

so that condition (7) holds for and with for all . From now on, we assume that and are chosen in this way.

Let . For each , let be an injection which fixes , and sends to or in such a way that . Let be the chain defined by if and , if , and if . Then, , and , for all . Let , and let be such that and . Define . Then,

 h′(x′)+h′(y′)=h(x)+h(y)
 =h′(f′(x′,y′))+h′(g′(x′,y′)).

It follows that is not submodular on . By Lemma 2, there are elements with such that . Hence,

 h(φ(z′))+h(φ(w′))=h′(z′)+h′(w′)
 =h(f(φ(z′),φ(w′)))+h(g(φ(z′),