Complexity of Discrete Energy Minimization Problems
Abstract
Discrete energy minimization is widelyused in computer vision and machine learning for problems such as MAP inference in graphical models. The problem, in general, is notoriously intractable, and finding the global optimal solution is known to be NPhard. However, is it possible to approximate this problem with a reasonable ratio bound on the solution quality in polynomial time? We show in this paper that the answer is no. Specifically, we show that general energy minimization, even in the 2label pairwise case, and planar energy minimization with three or more labels are expAPXcomplete. This finding rules out the existence of any approximation algorithm with a subexponential approximation ratio in the input size for these two problems, including constant factor approximations. Moreover, we collect and review the computational complexity of several subclass problems and arrange them on a complexity scale consisting of three major complexity classes – PO, APX, and expAPX, corresponding to problems that are solvable, approximable, and inapproximable in polynomial time. Problems in the first two complexity classes can serve as alternative tractable formulations to the inapproximable ones. This paper can help vision researchers to select an appropriate model for an application or guide them in designing new algorithms.
Keywords:
Energy minimization, complexity, NPhard, APX, expAPX, NPO, WCSP, minsum, MAP MRF, QPBO, planar graph[figure]style=plain,subcapbesideposition=center \BeforeEndproof∎
14(1,0.4) Author’s version. The final publication is available at link.springer.com.
1 Introduction
Discrete energy minimization, also known as minsum labeling [69] or weighted constraint satisfaction (WCSP)^{1}^{1}1WCSP is a more general problem, considering a bounded plus operation. It is itself a special case of valued CSP, where the objective takes values in a more general valuation set. [25], is a popular model for many problems in computer vision, machine learning, bioinformatics, and natural language processing. In particular, the problem arises in maximum a posteriori (MAP) inference for Markov (conditional) random fields (MRFs/CRFs) [43]. In the most frequently used pairwise case, the discrete energy minimization problem (simply “energy minimization” hereafter) is defined as
(1) 
where is the label for node in a graph . When the variables are binary (Boolean): , the problem can be written as a quadratic polynomial in [11] and is known as quadratic pseudoBoolean optimization (QPBO) [11].
In computer vision practice, energy minimization has found its place in semantic segmentation [51], pose estimation [71], scene understanding [57], depth estimation [44], optical flow estimation [70], image inpainting [59], and image denoising [8]. For example, treestructured models have been used to estimate pictorial structures such as body skeletons or facial landmarks [71], multilabel Potts models have been used to enforce a smoothing prior for semantic segmentation [51], and general pairwise models have been used for optimal flow estimation [70]. However, it may not be appreciated that the energy minimization formulations used to model these vision problems have greatly varied degrees of tractability or computational complexity. For the three examples above, the first allows efficient exact inference, the second admits a constant factor approximation, and the third has no quality guarantee on the approximation of the optimum.
The study of complexity of energy minimization is a broad field. Energy minimization problems are often intractable in practice except for special cases. While many researchers analyze the time complexity of their algorithms (e.g., using big O notation), it is beneficial to delve deeper to address the difficulty of the underlying problem. The two most commonly known complexity classes are P (polynomial time) and NP (nondeterministic polynomial time: all decision problems whose solutions can be verified in polynomial time). However, these two complexity classes are only defined for decision problems. The analogous complexity classes for optimization problems are PO (P optimization) and NPO (NP optimization: all optimization problems whose solution feasibility can be verified in polynomial time). Optimization problems form a superset of decision problems, since any decision problem can be cast as an optimization over the set yes, no, \ie, P PO and NP NPO. The NPhardness of an optimization problem means it is at least as hard as (under Turing reduction) the hardest decision problem in the class NP. If a problem is NPhard, then it is not in PO assuming P NP.
Although optimal solutions for problems in NPO, but not in PO, are intractable, it is sometimes possible to guarantee that a good solution (i.e., one that is worse than the optimal by no more than a given factor) can be found in polynomial time. These problems can therefore be further classified into class APX (constant factor approximation) and class expAPX (inapproximable) with increasing complexity (Figure 1). We can arrange energy minimization problems on this more detailed complexity scale, originally established in [4], to provide vision researchers a new viewpoint for complexity classification, with a focus on NPhard optimization problems.
In this paper, we make three core contributions, as explained in the next three paragraphs. First, we prove the inapproximability result of QPBO and general energy minimization. Second, we show that the same inapproximability result holds when restricting to planar graphs with three or more labels. In the proof, we propose a novel micrograph structurebased reduction that can be used for algorithmic design as well. Finally, we present a unified framework and an overview of visionrelated special cases where the energy minimization problem can be solved in polynomial time or approximated with a constant, logarithmic, or polynomial factor.
Binary and multilabel case (Section 3). It is known that QPBO (2label case) and the general energy minimization problem (multilabel case) are NPhard [12], because they generalize such classical NPhard optimization problems on graphs as vertex packing (maximum independent set) and the minimum and maximum cut problems [27]. In this paper, we show a stronger conclusion. We prove that QPBO as well as general energy minimization are complete (being the hardest problems) in the class expAPX. Assuming P NP, this implies that a polynomial time method cannot have a guarantee of finding an approximation within a constant factor of the optimal, and in fact, the only possible factor in polynomial time is exponential in the input size. In practice, this means that a solution may be essentially arbitrarily bad.
Planar three or more label case (Section 4). Planar graphs form the underlying graph structure for many computer vision and image processing tasks. It is known that efficient exact algorithms exist for some special cases of planar 2label energy minimization problems [55]. In this paper, we show that for the case of three or more labels, planar energy minimization is expAPXcomplete, which means these problems are as hard as general energy minimization. It is unknown that whether a constant ratio approximation exists for planar 2label problems in general.
Subclass problems (Section 5). Special cases for some energy minimization algorithms relevant to computer vision are known to be tractable. However, detailed complexity analysis of these algorithms is patchy and spread across numerous papers. In Section 5, we classify the complexity of these subclass problems and illustrate some of their connections. Such an analysis can help computer vision researchers become acquainted with existing complexity results relevant to energy minimization and can aid in selecting an appropriate model for an application or in designing new algorithms.
1.1 Related Work
Much of the work on complexity in computer vision has focused on experimental or empirical comparison of inference methods, including influential studies on choosing the best optimization techniques for specific classes of energy minimization problems [62, 26] and the PASCAL Probabilistic Inference Challenge, which focused on the more general context of inference in graphical models [1]. In contrast, our work focuses on theoretical computational complexity, rather than experimental analysis.
On the theoretical side, the NPhardness of certain energy minimization problems is well studied. It has been shown that 2label energy minimization is, in general, NPhard, but it can be in PO if it is submodular [30] or outerplanar [55]. For multilabel problems, the NPhardness was proven by reduction from the NPhard multiway cut problem [13]. These results, however, say nothing about the complexity of approximating the global optimum for the intractable cases. The complexity involving approximation has been studied for classical combinatorial problems, such as MAXCUT and MAX2SAT, which are known to be APXcomplete [46]. QPBO generalizes such problems and is therefore APXhard. This leaves a possibility that QPBO may be in APX, \ie, approximable within a constant factor.
Energy minimization is often used to solve MAP inference for undirected graphical models. In contrast to scarce results for energy minimization and undirected graphical models, researchers have more extensively studied the computational complexity of approximating the MAP solution for Bayesian networks, also known as directed graphical models [42]. Abdelbar and Hedetniemi first proved the NPhardness for approximating the MAP assignment of directed graphical models in the value of probability, \ie, finding such that
(2) 
with a constant or polynomial ratio is NPhard and showing that this problem is polyAPXhard [2]. The probability approximation ratio is closest to the energy ratio used in our work, but other approximation measures have also been studied. Kwisthout showed the NPhardness for approximating MAPs with the measure of additive value, structure, and rankapproximation [40, 41, 42]. He also investigated the hardness of expectationapproximation of MAP and found that no randomized algorithm can expectationapproximate MAP in polynomial time with a bounded margin of error unless NP BPP, an assumption that is highly unlikely to be true [42].
Unfortunately, the complexity results for directed models do not readily transfer to undirected models and vice versa. In directed and undirected models, the graphs represent different conditional independence relations, thus the underlying family of probability distributions encoded by these two models is distinct, as detailed in Appendix 0.B. However, one can ask similar questions on the hardness of undirected models in terms of various approximation measures. In this work, we answer two questions, “How hard is it to approximate the MAP inference in the ratio of energy (log probability) and the ratio of probability?” The complexity of structure, rank, and expectationapproximation remain open questions for energy minimization.
2 Definitions and Notation
There are at least two different sets of definitions of what is considered an NP optimization problem [45, 4]. Here, we follow the notation of Ausiello et al [4] and restate the definitions needed for us to state and prove our theorems in Sections 3 and 4 with our explanation of their relevance to our proofs.
Definition 2.0 (Optimization Problem, [4] Def. 1.16).
An optimization problem is characterized by a quadruple where

is the set of instances of .

is a function that associates to any input instance the set of feasible solutions of .

is the measure function, defined for pairs such that and . For every such pair , provides a positive integer.

.
Notice the assumption that the cost is positive, and, in particular, it cannot be zero.
Definition 2.0 (Class NPO, [4] Def 1.17).
An optimization problem belongs to the class of NP optimization (NPO) problems if the following hold:

The set of instances is recognizable in polynomial time.

There exists a polynomial such that given an instance , for any , and, besides, for any such that , it is decidable in polynomial time whether .

The measure function is computable in polynomial time.
Definition 2.0 (Class PO, [4] Def 1.18).
An optimization problem belongs to the class of PO if it is in NPO and there exists a polynomialtime algorithm that, for any instance , returns an optimal solution , together with its value .
For intractable problems, it may be acceptable to seek an approximate solution that is sufficiently close to optimal.
Definition 2.0 (Approximation Algorithm, [4] Def. 3.1).
Given an optimization problem an algorithm is an approximation algorithm for if, for any given instance , it returns an approximate solution, that is a feasible solution .
Definition 2.0 (Performance Ratio, [4], Def. 3.6).
Given an optimization problem , for any instance of and for any feasible solution , the performance ratio, approximation ratio or approximation factor of with respect to is defined as
(3) 
where is the measure of the optimal solution for the instance .
Since is a positive integer, the performance ratio is welldefined. It is a rational number in . Notice that from this definition, it follows that if finding a feasible solution, \eg, is an NPhard decision problem, then there exists no polynomialtime approximation algorithm for , irrespective of the kind of performance evaluation that one could possibly mean.
Definition 2.0 (approximation, [4], Def. 8.1).
Given an optimization problem in NPO, an approximation algorithm for , and a function , we say that is an approximate algorithm for if, for any instance of such that , the performance ratio of the feasible solution with respect to verifies the following inequality:
(4) 
Definition 2.0 (Apx, [4], Def. 8.2).
Given a class of functions , APX is the class of all NPO problems such that, for some function , there exists a polynomialtime approximate algorithm for .
The class of constant functions for yields the complexity class APX. Together with logarithmic, polynomial, and exponential functions applied in Definition 2.0, the following complexity axis is established:
Since the measure needs to be computable in polynomial time for NPO problems, the largest measure and thus the largest performance ratio is an exponential function. But expAPX is not equal to NPO (assuming P NP) because NPO contains problems whose feasible solutions cannot be found in polynomial time. For an energy minimization problem, any label assignment is a feasible solution, implying that all energy minimization problems are in expAPX.
The standard approach for proofs in complexity theory is to perform a reduction from a known NPcomplete problem. Unfortunately, the most common polynomialtime reductions ignore the quality of the solution in the approximated case. For example, it is shown that any energy minimization problem can be reduced to a factor 2 approximable Potts model [48], however the reduction is not approximation preserving and is unable to show the hardness of general energy minimization in terms of approximation. Therefore, it is necessary to use an approximation preserving (AP) reduction to classify NPO problems that are not in PO, for which only the approximation algorithms are tractable. APpreserving reductions preserve the approximation ratio in a linear fashion, and thus preserve the membership in these complexity classes. Formally,
Definition 2.0 (APreduction, [4] Def. 8.3).
Let and be two problems in NPO. is said to be APreducible to , in symbols , if two functions and and a positive constant exist such that ^{2}^{2}2The complete definition contains a rational for the the two mappings ( and ) and it is omitted here for simplicity.:

For any instance , .

For any instance , if then .

For any instance and for any , .

and are computable by algorithms whose running time is polynomial.

For any instance , for any rational , and for any ,
(5) (6)
APreduction is the formal definition of the term ‘as hard as’ used in this paper unless otherwise specified. It defines a partial order among optimization problems. With respect to this relationship, we can formally define the subclass containing the hardest problems in a complexity class:
Definition 2.0 (hard and complete, [4] Def. 8.5).
Given a class of NPO problems, a problem is hard if, for any , . A hard problem is complete if it belongs to .
Intuitively, a complexity class specifies the upper bound on the hardness of the problems within, hard specifies the lower bound, and complete exactly specifies the hardness.
3 Inapproximability for the General Case
In this section, we show that QPBO and general energy minimization are inapproximable by proving they are expAPXcomplete. As previously mentioned, it is already known that these problems are NPhard [12], but it was previously unknown whether useful approximation guarantees were possible in the general case. The formal statement of QPBO as an optimization problem is as follows:
Problem 1.
QPBO

A pseudoBoolean function
(7) given by the collection of unary terms and pairwise terms .

Assignment of variables .

.
Theorem 3.1.
QPBO is expAPXcomplete.
Proof Sketch.
(Full proof in Appendix 0.A).

We observe that W3SATtriv is known to be expAPXcomplete [4]. W3SATtriv is a 3SAT problem with weights on the variables and an artificial, trivial solution.

Each 3clause in the conjunctive normal form can be represented as a polynomial consisting of three binary variables. Together with representing the weights with the unary terms, we arrive at a cubic Boolean minimization problem.

We use the method of [24] to transform the cubic Boolean problem into a quadratic one, with polynomially many additional variables, which is an instance of QPBO.

Together with an inverse mapping that we define, the above transformation defines an APreduction from W3SATtriv to QPBO, \ieW3SATtriv QPBO. This proves that QPBO is expAPXhard.

We observe that all energy minimization problems are in expAPX and thereby conclude that QPBO is expAPXcomplete.
This inapproximability result can be generalized to more than two labels.
Corollary 3.1.
label energy minimization is expAPXcomplete for .
Proof Sketch.
(Full proof in Appendix 0.A). This theorem is proved by showing QPBO label energy minimization for .
We show in Corollary 0.B.0 the inapproximability in energy (log probability) transfer to probability in Equation Eq. 2 as well.
Taken together, this theorem and its corollaries form a very strong inapproximability result for general energy minimization ^{3}^{3}3These results automatically generalize to higher order cases as they subsume the pairwise cases discussed here.. They imply not only NPhardness, but also that there is no algorithm that can approximate general energy minimization with two or more labels with an approximation ratio better than some exponential function in the input size. In other words, any approximation algorithm of the general energy minimization problem can perform arbitrarily badly, and it would be pointless to try to prove a bound on the approximation ratio for existing approximation algorithms for the general case. While this conclusion is disappointing, these results serve as a clarification of grounds and guidance for model selection and algorithm design. Instead of counting on an oracle that solves the energy minimization problem, researchers should put efforts into selecting the proper formulation, trading off expressiveness for tractability.
4 Inapproximability for the Planar Case
Efficient algorithms for energy minimization have been found for special cases of 2label planar graphs. Examples include planar 2label problems without unary terms and outerplanar 2label problems (i.e., the graph structure remains planar after connecting to a common node) [55]. Grid structures over image pixels naturally give rise to planar graphs in computer vision. Given their frequency of use in this domain, it is natural to consider the complexity of more general cases involving planar graphs. Figure 2 visualizes the current state of knowledge of the complexity of energy minimization problems on planar graphs. In this section, we prove that for the case of planar graphs with three or more labels, energy minimization is expAPXcomplete. This result is important because it significantly reduces the space of potentially efficient algorithms on planar graphs. The existence of constant ratio approximation for planar 2label problems in general remains an open question ^{4}^{4}4The planar 2label problem in general is APXhard, since it subsumes the APX problem planar vertex cover [7]..
Theorem 4.1.
Planar 3label energy minimization is expAPXcomplete.
Proof Sketch.
(Full proof in Appendix 0.A).

We construct elementary gadgets to reduce any 3label energy minimization problem to a planar one with polynomially many auxiliary nodes.

Together with an inverse mapping that we define, the above construction defines an APreduction, \ie, 3label energy minimization planar 3label energy minimization.

Since 3label energy minimization is expAPXcomplete (Corollary 3.1) and all energy minimization problems are in expAPX, we thereby conclude that planar 3label energy minimization is expAPXcomplete.
Corollary 4.1.
Planar label energy minimization is expAPXcomplete, for .
Proof Sketch.
(Full proof in Appendix 0.A). This theorem is proved by showing planar 3label energy minimization planar label energy minimization, for .
These theorems show that the restricted case of planar graphs with 3 or more labels is as hard as general case for energy minimization problems with the same inapproximable implications discussed in Section 3.
The most novel and useful aspect of the proof of Theorem 4.1 is the planar reduction in Step 1. The reduction creates an equivalent planar representation to any nonplanar 3label graph. That is, the graphs share the same optimal value. The reduction applies elementary constructions or “gadgets” to uncross two intersecting edges. This process is repeated until all intersecting edges are uncrossed. Similar elementary constructions were used to study the complexity of the linear programming formulation of energy minimization problems [49, 48]. Our novel gadgets have three key properties at the same time: 1) they are able to uncross intersecting edges, 2) they work on nonrelaxed problems, \ie, all indicator variables (or pseudomarginals to be formal) are integral; and 3) they can be applied repeatedly to build an APreduction.
The two gadgets used in our reduction are illustrated in Figure 3. A 3label node can be encoded as a collection of 3 indicator variables with a onehot constraint. In the figure, a solid colored circle denotes a 3label node, and a solid colored rectangle denotes the equivalent node expressed with indicator variables (white circles). For example, in Figure 3, corresponds to the blue node taking the first label value. The pairwise potentials (edges on the left part of the figures) can be viewed as edge costs between the indicator variables (black lines on the right), \eg, is placed onto the edge between indicator and and is counted into the overall measure if and only if . In our gadgets, drawn edges represent zero cost while omitted edges represent positive infinity^{5}^{5}5A very large number will also serve the same purpose, \eg, take the sum of the absolute value of all energy terms and add 1. Therefore, we are not expanding the set of allowed energy terms to include .. While the set of feasible solutions remains the same, the gadget encourages certain labeling relationships, which, if not satisfied, cause the overall measure to be infinity. Therefore, the encouraged relationships must be satisfied by any optimal solution. The two gadgets serve different purposes:
Split A 3label node (blue) is split into two 2label nodes (green). The shaded circle represents a label with a positive infinite unary cost and thus creates a simulated 2label node. The encouraged relationships are

.

.

.
Thus encodes , and both encode and encodes .
UncrossCopy The values of two 2label nodes are encouraged to be the same as their diagonal counterparts respectively (red to red, green to green) without crossing with each other. The orange nodes are intermediate nodes that pass on the values. All types of lines represent the same edge cost, which is 0. The color differences visualize the verification for each of the 4 possible states of two 2label nodes. For example, the cyan lines verify the case where the topleft (green) node takes the values (1, 0) and the topright (red) node takes the value (0, 1). It is clear that the encouraged solution is for the bottomleft (red) node and the bottomright (green) node to take the value (0, 1) and (1, 0) respectively.



Split  UncrossCopy 
These two gadgets can be used to uncross the intersecting edges of two pairs of 3label nodes (Figure 4, left). For a crossing edge (, ), first a new 3label node is introduced preserving the same arbitrary interaction (red line) as before (Figure 4, middle). Then, the crossing edges (enclosed in the dotted circle) are uncrossed by applying Split and UncrossCopy four times (Figure 4, right). Without loss of generality, we can assume that no more than two edges intersect at a common point except at their endpoints. This process can be applied repeatedly at each edge crossing until there are no edge crossings left in the graph [49].
5 Complexity of Subclass Problems
In this section, we classify some of the special cases of energy minimization according to our complexity axis (Figure 1). This classification can be viewed as a reinterpretation of existing results from the literature into a unified framework.
5.1 Class PO (Global Optimum)
Polynomial time solvability may be achieved by considering two principal restrictions: those restricting the structure of the problem, \ie, the graph , and those restricting the type of allowed interactions, \ie, functions .
Structure Restrictions. When is a chain, energy minimization reduces to finding a shortest path in the trellis graph, which can be solved using a classical dynamic programming (DP) method known as the Viterbi algorithm [20]. The same DP principle applies to graphs of bounded treewidth. Fixing all variables in a separator set decouples the problem into independent optimization problems. For treewidth 1, the separators are just individual vertices, and the problem is solved by a variant of DP [47, 54]. For larger treewidths, the respective optimization procedure is known as junction tree decomposition [43]. A loop is a simple example of a treewidth 2 problem. However, for a treewidth problem, the time complexity is exponential in [43]. When is an outerplanar graph, the problem can be solved by the method of [55], which reduces it to a planar Ising model, for which efficient algorithms exist [60].
Interaction Restrictions. Submodularity is a restriction closely related to problems solvable by minimum cut. A quadratic pseudoBoolean function is submodular iff its quadratic terms are nonpositive. It is then known to be equivalent with finding a minimum cut in a corresponding network [21]. Another way to state this condition for QPBO is However, submodularity is more general. It extends to higherorder and multilabel problems. Submodularity is considered a discrete analog of convexity. Just as convex functions are relatively easy to optimize, general submodular function minimization can be solved in strongly polynomial time [56]. Kolmogorov and Zabin introduced submodularity in computer vision and showed that binary 2^{nd} order and 3^{rd} order submodular problems can be always reduced to minimum cut, which is much more efficient than general submodular function minimization [34]. Živný et al. and Ramalingam et al. give more results on functions reducible to minimum cut [68, 50]. For QPBO on an unrestricted graph structure, the following dichotomy result has been proven by Cohen et al. [16]: either the problem is submodular and thus in PO or it is NPhard (\ie, submodular problems are the only ones that are tractable in this case).
For multilabel problems Ishikawa proposed a reduction to minimum cut for problems with convex interactions, \ie, where and is convex and symmetric [23]. It is worth noting that when the unary terms are convex as well, the problem can be solved even more efficiently [22, 31]. The same reduction [23] remains correct for a more general class of submodular multilabel problems. In modern terminology, componentwise minimum and componentwise maximum of complete labelings , for all nodes are introduced (). These operations depend on the order of labels and, in turn, define a lattice on the set of labelings. The function is called submodular on the lattice if for all , [65]. In the pairwise case, the condition can be simplified to the form of submodularity common in computer vision [50]: In particular, it is easy to see that a convex satisfies it [23]. Kolmogorov [32] and Arora et al. [3] proposed maxflowlike algorithms for higher order submodular energy minimization. Schlesinger proposed an algorithm to find a reordering in which the problem is submodular if one exists [53]. However, unlike in the binary case, solvable multilabel problems are more diverse. A variety of problems are generalizations of submodularity and are in PO, including symmetric tournament pair, submodularity on arbitrary trees, submodularity on arbitrary lattices, skew bisubmodularity, and bisubmodularity on arbitrary domains (see references in [64]). Thapper and Živný [63] and Kolmogorov [33] characterized these tractable classes and proved a similar dichotomy result: a problem of unrestricted structure is either solvable by LPrelaxation (and thus in PO) or it is NPhard. It appears that LP relaxation is the most powerful and general solving technique [72].
Mixed Restrictions. In comparison, results with mixed structure and interaction restrictions are rare. One example is a planar Ising model without unary terms [60]. Since there is a restriction on structure (planarity) and unary terms, it does not fall into any of the classes described above. Another example is the restriction to supermodular functions on a bipartite graph, solvable by [53] or by LP relaxation, but not falling under the characterization [64] because of the graph restriction.
Algorithmic Applications. The aforementioned tractable formulations in PO can be used to solve or approximate harder problems. Trees, cycles and planar problems are used in dual decomposition methods [35, 36, 9]. Binary submodular problems are used for finding an optimized crossover of twocandidate multilabel solutions. An example of this technique, the expansion move algorithm, achieves a constant approximation ratio for the Potts model [13]. Extended dynamic programming can be used to solve restricted segmentation problems [18] and as movemaking subroutine [67]. LP relaxation also provides approximation guarantees for many problems [5, 15, 28, 37], placing them in the APX or polyAPX class.
5.2 Class APX and Class logAPX (Bounded Approximation)
Problems that have bounded approximation in polynomial time usually have certain restriction on the interaction type. The Potts model may be the simplest and most common way to enforce the smoothness of the labeling. Each pairwise interaction depends on whether the neighboring labellings are the same, \ie. Boykov \etalshowed a reduction to this problem from the NPhard multiway cut [13], also known to be APXcomplete [4, 17]. They also proved that their constructed alphaexpansion algorithm is a 2approximate algorithm. These results prove that the Potts model is in APX but not in PO. However, their reduction from multiway cut is not an APreduction, as it violates the third condition of APreducibility. Therefore, it is still an open problem whether the Potts model is APXcomplete. Boykov \etalalso showed that their algorithm can approximate the more general problem of metric labeling [13]. The energy is called metric if, for an arbitrary, finite label space , the pairwise interaction satisfies a) , b) , and c) , for any labels , , and any . Although their approximation algorithm has a bound on the performance ratio, the bound depends on the ratio of some pairwise terms, a number that can grow exponentially large. For metric labeling with labels, Kleinberg \etalproposed an approximation algorithm. This bound was further improved to by Chekuri et al. [14], making metric labeling a problem in logAPX ^{6}^{6}6An approximation implies an approximation (see Corollary 0.C.0). .
We have seen that a problem with convex pairwise interactions is in PO. An interesting variant is its truncated counterpart, \ie, , where is a nonnegative weight, is a convex symmetric function to define the distance between two labels, and is the truncating constant [66]. This problem is NPhard [66], but Kumar et al. [39] have proposed an algorithm that yields bounded approximations with a factor of for linear distance functions and a factor of for quadratic distance functions^{7}^{7}7In these truncated convex problems, the ratio bound is defined for the pairwise part of the energy Eq. 1. The approximation ratio in accordance to our definition is obtained assuming the unary terms are nonnegative.. This bound is analyzed for more general distance functions by Kumar [38].
Another APX problem with implicit restrictions on the interaction type is logic MRF [6]. It is a powerful higher order model able to encode arbitrary logical relations of Boolean variables. It has energy function , where each is a disjunctive clause involving a subset of Boolean variables , and if it is satisfied and 0 otherwise. Each clause is assigned a nonnegative weight . The goal is to find an assignment of to maximize . As disjunctive clauses can be converted into polynomials, this is essentially a pseudoBoolean optimization problem. However, this is a special case of general 2label energy minimization, as its polynomial basis spans a subspace of the basis of the latter. Bach et al. [6] proved that logic MRF is in APX by showing that it is a special case of MAXSAT with nonnegative weights.
6 Discussion
The algorithmic implications of our inapproximability have been discussed above. Here, we focus on the discussion of practical implications. The existence of an approximation guarantee indicates a practically relevant class of problems where one may expect reasonable performance. In structural learning for example, it is acceptable to have a constant factor approximation for the inference subroutine when efficient exact algorithms are not available. Finley and Joachims proved that this constant factor approximation guarantee yields a multiplicative bound on the learning objective, providing a relative guarantee for the quality of the learned parameters [19]. An optimality guarantee is important, because the inference subroutine is repeatedly called, and even a single poor approximation, which returns a notsobad worst violator, will lead to the early termination of the structural learning algorithm.
However, despite having no approximation ratio guarantee, algorithms such as the extended roof duality algorithm for QPBO [52] are still widely used. This gap between theory and application applies not only to our results but to all other complexity results as well. We list several key reasons for the potential lack of correspondence between theoretical complexity guarantees and practical performance.
Complexity results address the worst case scenario. Our inapproximability result guarantees that for any polynomial time algorithm, there exists an input instance for which the algorithm will produce a very poor approximation. However, applications often do not encounter the worst case. Such is the case with the simplex algorithm, whose worst case complexity is exponential, yet it is widely used in practice.
Objective function is not the final evaluation criterion. In many image processing tasks, the final evaluation criterion is the number of pixels correctly labeled. The relation between the energy value and the accuracy is implicit. In many cases, a local optimum is good enough to produce a high labeling accuracy and a visually appealing labeling.
Other forms of optimality guarantee or indicator exist. Approximation measures in the distance of solutions or in the expectation of the objective value are likely to be prohibitive for energy minimization, as they are for Bayesian networks [40, 41, 42]. On the other hand, a family of energy minimization algorithms has the property of being persistent or partial optimal, meaning a subset of nodes have consistent labeling with the global optimal one [10, 11]. Rather than being an optimality guarantee, persistency is an optimality indicator. In the worst case, the set of persistent labelings could be empty, yet the percentage of persistent labelings over the all the nodes gives us a notion of the algorithm’s performance on this particular input instance. Persistency is also useful in reducing the size of the search space [29, 58]. Similarly, the perinstance integrality gap of duality based methods is another form of optimality indicator and can be exponentially large for problems in general [37, 61].
7 Conclusion
In this paper, we have shown inapproximablity results for energy minimization in the general case and planar 3label case. In addition, we present a unified overview of the complexity of existing energy minimization problems by arranging them in a finegrained complexity scale. These altogether set up a new viewpoint for interpreting and classifying the complexity of optimization problems for the computer vision community. In the future, it will be interesting to consider the open questions of the complexity of structure, rank, and expectationapproximation for energy minimization.
Acknowledgements
This material is based upon work supported by the National Science Foundation under Grant No. IIS1328930 and by the European Research Council under the Horizon 2020 program, ERC starting grant agreement 640156.
References
 PIC [2011] The Probabilistic Inference Challenge (2011), http://www.cs.huji.ac.il/project/PASCAL/
 Abdelbar and Hedetniemi [1998] Abdelbar, A., Hedetniemi, S.: Approximating MAPs for belief networks is NPhard and other theorems. Artificial Intelligence 102(1), 21–38 (6 1998)
 Arora et al. [2012] Arora, C., Banerjee, S., Kalra, P., Maheshwari, S.N.: Generic cuts: An efficient algorithm for optimal inference in higher order MRFMAP. In: ECCV. pp. 17–30 (2012)
 Ausiello et al. [1999] Ausiello, G., Crescenzi, P., Gambosi, G., Kann, V., MarchettiSpaccamela, A., Protasi, M.: Complexity and approximation: Combinatorial optimization problems and their approximability properties. Springer (1999)
 Bach et al. [2015a] Bach, S.H., Huang, B., Getoor, L.: Unifying local consistency and max sat relaxations for scalable inference with rounding guarantees. In: AISTATS. JMLR Proceedings, vol. 38 (2015a)
 Bach et al. [2015b] Bach, S.H., Huang, B., Getoor, L.: Unifying local consistency and max sat relaxations for scalable inference with rounding guarantees. In: AISTATS. pp. 46–55 (2015b)
 BarYehuda and Even [1982] BarYehuda, R., Even, S.: On approximating a vertex cover for planar graphs. In: Proceedings of the Fourteenth Annual ACM Symposium on Theory of Computing. pp. 303–309 (1982)
 Barbu [2009] Barbu, A.: Learning realtime MRF inference for image denoising. In: CVPR. pp. 1574–1581 (2009)
 Batra et al. [2010] Batra, D., Gallagher, A.C., Parikh, D., Chen, T.: Beyond trees: Mrf inference via outerplanar decomposition. In: CVPR. pp. 2496–2503 (2010)
 Boros et al. [1991] Boros, E., Hammer, P.L., Sun, X.: Network flows and minimization of quadratic pseudoBoolean functions. Tech. Rep. RRR 171991, RUTCOR (May 1991)
 Boros and Hammer [2001] Boros, E., Hammer, P.: PseudoBoolean optimization. Tech. rep., RUTCOR (October 2001)
 Boros and Hammer [2002] Boros, E., Hammer, P.: PseudoBoolean optimization. Discrete Applied Mathematics 13(123), 155–225 (2002)
 Boykov et al. [2001] Boykov, Y., Veksler, O., Zabih, R.: Fast approximate energy minimization via graph cuts. PAMI 23, 1222–1239 (November 2001)
 Chekuri et al. [2005] Chekuri, C., Khanna, S., Naor, J., Zosin, L.: A linear programming formulation and approximation algorithms for the metric labeling problem. SIAM Journal on Discrete Mathematics 18(3), 608–625 (2005)
 Chekuri et al. [2001] Chekuri, C., Khanna, S., Naor, J., Zosin, L.: Approximation algorithms for the metric labeling problem via a new linear programming formulation. In: In Symposium on Discrete Algorithms. pp. 109–118 (2001)
 Cohen et al. [2004] Cohen, D., Cooper, M., Jeavons, P.: Principles and Practice of Constraint Programming, chap. A Complete Characterization of Complexity for Boolean Constraint Optimization Problems, pp. 212–226 (2004)
 Dahlhaus et al. [1994] Dahlhaus, E., Johnson, D.S., Papadimitriou, C.H., Seymour, P.D., Yannakakis, M.: The complexity of multiterminal cuts. SIAM Journal on Computing 23(4), 864–894 (1994)
 Felzenszwalb and Veksler [2010] Felzenszwalb, P.F., Veksler, O.: Tiered scene labeling with dynamic programming. In: CVPR. pp. 3097–3104 (2010)
 Finley and Joachims [2008] Finley, T., Joachims, T.: Training structural svms when exact inference is intractable. In: ICML. pp. 304–311. ACM (2008)
 Forney Jr [1973] Forney Jr, G.D.: The viterbi algorithm. Proceedings of the IEEE 61(3), 268–278 (1973)
 Hammer [1965] Hammer, P.L.: Some network flow problems solved with pseudoBoolean programming. Operation Research 13, 388–399 (1965)
 Hochbaum [2001] Hochbaum, D.S.: An efficient algorithm for image segmentation, Markov random fields and related problems. J. ACM 48(4), 686–701 (Jul 2001)
 Ishikawa [2003] Ishikawa, H.: Exact optimization for Markov random fields with convex priors. PAMI 25(10), 1333–1336 (2003)
 Ishikawa [2011] Ishikawa, H.: Transformation of general binary MRF minimization to the firstorder case. PAMI 33(6), 1234–1249 (2011)
 Jeavons et al. [2014] Jeavons, P., Krokhin, A., Živnỳ, S., et al.: The complexity of valued constraint satisfaction. Bulletin of EATCS 2(113) (2014)
 Kappes et al. [2015] Kappes, J.H., Andres, B., Hamprecht, F.A., Schnörr, C., Nowozin, S., Batra, D., Kim, S., Kausler, B.X., Kröger, T., Lellmann, J., et al.: A comparative study of modern inference techniques for structured discrete energy minimization problems. IJCV 115, 155–184 (2015)
 Karp [1972] Karp, R.M.: Reducibility among combinatorial problems. In: Proceedings of a symposium on the Complexity of Computer Computations. pp. 85–103 (1972)
 Kleinberg and Tardos [2002] Kleinberg, J., Tardos, E.: Approximation algorithms for classification problems with pairwise relationships: metric labeling and Markov random fields. J. ACM 49(5), 616–639 (2002)
 Kohli et al. [2008] Kohli, P., Shekhovtsov, A., Rother, C., Kolmogorov, V., Torr, P.: On partial optimality in multilabel MRFs. In: ICML. pp. 480–487 (2008)
 Kolmogorov and Zabih [2004] Kolmogorov, V., Zabih, R.: What energy functions can be minimized via graph cuts? PAMI 26(2), 147–159 (February 2004)
 Kolmogorov [2005] Kolmogorov, V.: Primaldual algorithm for convex Markov random fields. Tech. Rep. MSRTR2005117, Microsoft Research Cambridge (2005)
 Kolmogorov [2012] Kolmogorov, V.: Minimizing a sum of submodular functions. Discrete Applied Mathematics 160(15), 2246 – 2258 (2012)
 Kolmogorov [2013] Kolmogorov, V.: The power of linear programming for finitevalued CSPs: A constructive characterization. In: Automata, Languages, and Programming, Lecture Notes in Computer Science, vol. 7965, pp. 625–636 (2013)
 Kolmogorov and Zabin [2004] Kolmogorov, V., Zabin, R.: What energy functions can be minimized via graph cuts? PAMI 26(2), 147–159 (2004)
 Komodakis et al. [2007] Komodakis, N., Paragios, N., Tziritas, G.: MRF optimization via dual decomposition: Messagepassing revisited. In: ICCV. pp. 1–8 (2007)
 Komodakis and Paragios [2008] Komodakis, N., Paragios, N.: Beyond loose LPrelaxations: Optimizing MRFs by repairing cycles. In: ECCV. pp. 806–820 (2008)
 Komodakis and Tziritas [2007] Komodakis, N., Tziritas, G.: Approximate labeling via graph cuts based on linear programming. PAMI 29(8), 1436–1453 (2007)
 Kumar [2014] Kumar, M.P.: Roundingbased moves for metric labeling. In: NIPS. pp. 109–117 (2014)
 Kumar et al. [2011] Kumar, M.P., Veksler, O., Torr, P.H.: Improved moves for truncated convex models. Journal of Machine Learning Research 12, 31–67 (Feb 2011)
 Kwisthout [2011] Kwisthout, J.: Most probable explanations in Bayesian networks: Complexity and tractability. International Journal of Approximate Reasoning 52(9), 1452 – 1469 (2011)
 Kwisthout [2013] Kwisthout, J.: Symbolic and Quantitative Approaches to Reasoning with Uncertainty: 12th European Conference, ECSQARU 2013, Utrecht, The Netherlands, July 810, 2013. Proceedings, chap. Structure Approximation of Most Probable Explanations in Bayesian Networks, pp. 340–351 (2013)
 Kwisthout [2015] Kwisthout, J.: Treewidth and the computational complexity of MAP approximations in Bayesian networks. Journal of Artificial Intelligence Research pp. 699–720 (2015)
 Lauritzen [1998] Lauritzen, S.L.: Graphical Models. No. 17 in Oxford Statistical Science Series, Oxford Science Publications (1998)
 Liu et al. [2010] Liu, B., Gould, S., Koller, D.: Single image depth estimation from predicted semantic labels. In: CVPR. pp. 1253–1260 (2010)
 Orponen and Mannila [1987] Orponen, P., Mannila, H.: On approximation preserving reductions: complete problems and robust measures. Technical Report (1987)
 Papadimitriou and Yannakakis [1991] Papadimitriou, C.H., Yannakakis, M.: Optimization, approximation, and complexity classes. Journal of Computer and System Sciences 43(3), 425 – 440 (1991)
 Pearl [1988] Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Publishers Inc. (1988)
 Prusa and Werner [2015a] Prusa, D., Werner, T.: How hard is the lp relaxation of the potts minsum labeling problem. In: Energy Minimization Methods in Computer Vision and Pattern Recognition (EMMCVPR). pp. 57–70 (2015a)
 Prusa and Werner [2015b] Prusa, D., Werner, T.: Universality of the local marginal polytope. PAMI 37(4), 898–904 (2015b)
 Ramalingam et al. [2008] Ramalingam, S., Kohli, P., Alahari, K., Torr, P.H.: Exact inference in multilabel crfs with higher order cliques. In: CVPR. pp. 1–8. IEEE (2008)
 Ren et al. [2012] Ren, X., Bo, L., Fox, D.: Rgb(d) scene labeling: Features and algorithms. In: CVPR. pp. 2759–2766. IEEE (2012)
 Rother et al. [2007] Rother, C., Kolmogorov, V., Lempitsky, V., Szummer, M.: Optimizing binary MRFs via extended roof duality. In: CVPR. pp. 1–8 (2007)
 Schlesinger [2007] Schlesinger, D.: Exact solution of permuted submodular minsum problems. In: EMMCVPR. vol. 4679, pp. 28–38. Springer (2007)
 Schlesinger and Hlaváč [2002] Schlesinger, M.I., Hlaváč, V.: Ten lectures on statistical and structural pattern recognition, Computational Imaging and Vision, vol. 24. Kluwer Academic Publishers, Dordrecht, The Netherlands (2002)
 Schraudolph [2010] Schraudolph, N.: Polynomialtime exact inference in NPhard binary MRFs via reweighted perfect matching. In: AISTATS. JMLR Proceedings, vol. 9, pp. 717–724 (2010)
 Schrijver [2000] Schrijver, A.: A combinatorial algorithm minimizing submodular functions in strongly polynomial time. Journal of Combinatorial Theory Series B(80), 346–355 (2000)
 Schwing and Urtasun [2012] Schwing, A.G., Urtasun, R.: Efficient exact inference for 3d indoor scene understanding. In: ECCV, pp. 299–313. Springer (2012)
 Shekhovtsov et al. [2015] Shekhovtsov, A., Swoboda, P., Savchynskyy, B.: Maximum persistency via iterative relaxed inference with graphical models. In: CVPR (2015)
 Shekhovtsov et al. [2012] Shekhovtsov, A., Kohli, P., Rother, C.: Curvature prior for MRFbased segmentation and shape inpainting. In: DAGM/OAGM. pp. 41–51 (2012)
 Shih et al. [1990] Shih, W.K., Wu, S., Kuo, Y.S.: Unifying maximum cut and minimum cut of a planar graph. IEEE Transactions on Computers 39(5), 694–697 (May 1990)
 Sontag et al. [2012] Sontag, D., Choe, D.K., Li, Y.: Efficiently searching for frustrated cycles in MAP inference. In: Uncertainty in Artificial Intelligence (UAI). pp. 795–804 (2012)
 Szeliski et al. [2008] Szeliski, R., Zabih, R., Scharstein, D., Veksler, O., Kolmogorov, V., Agarwala, A., Tappen, M., Rother, C.: A comparative study of energy minimization methods for Markov random fields with smoothnessbased priors. PAMI 30(6), 1068–1080 (2008)
 Thapper and Živný [2012] Thapper, J., Živný, S.: The power of linear programming for valued CSPs. In: Symposium on Foundations of Computer Science (FOCS). pp. 669–678 (2012)
 Thapper and Živný [2013] Thapper, J., Živný, S.: The complexity of finitevalued CSPs. In: Symposium on the Theory of Computing (STOC). pp. 695–704 (2013)
 Topkis [1978] Topkis, D.M.: Minimizing a submodular function on a lattice. Operations Research 26(2), 305–321 (1978)
 Veksler [2007] Veksler, O.: Graph cut based optimization for MRFs with truncated convex priors. In: CVPR. pp. 1–8 (2007)
 Vineet et al. [2012] Vineet, V., Warrell, J., Torr, P.H.S.: A tiered movemaking algorithm for general pairwise MRFs. In: CVPR. pp. 1632–1639 (2012)
 Živný et al. [2009] Živný, S., Cohen, D.A., Jeavons, P.G.: The expressive power of binary submodular functions. Discrete Applied Mathematics 157(15), 3347 – 3358 (2009)
 Werner [2007] Werner, T.: A linear programming approach to maxsum problem: A review. PAMI 29(7), 1165–1179 (July 2007)
 Xu et al. [2012] Xu, L., Jia, J., Matsushita, Y.: Motion detail preserving optical flow estimation. PAMI 34(9), 1744–1757 (2012)
 Yang and Ramanan [2011] Yang, Y., Ramanan, D.: Articulated pose estimation with flexible mixturesofparts. In: CVPR. pp. 1385–1392 (2011)
 Živný et al. [2014] Živný, S., Werner, T., Průša, D.a.: The Power of LP Relaxation for MAP Inference, pp. 19–42. The MIT Press, Cambridge, USA (December 2014)
Complexity of Discrete Energy Minimization Problems (ECCV’16 Appendix)
Mengtian Li Alexander Shekhovtsov Daniel Huber
Appendix 0.A Formal Proofs
Note for all proofs in this section, we assign integer values to Boolean functions: 0 for False and 1 for True.
0.a.1 General Case
See 3.1
Proof.
We reduce from the following problem.
Problem 2 ([4], Section 8.3.2).
W3SATtriv

Boolean CNF formula with variables and each clause assuming exactly 3 variables; nonnegative integer weights .

Truth assignment to the variables that either satisfies or assigns the trivial, alltrue assignment.

.
W3SATtriv is known to be expAPXcomplete [4]. We use an APreduction from W3SATtriv to prove the same completeness result for QPBO. The optimal value of W3SATtriv is upper bounded by because the alltrue assignment is feasible. The objective weight is represented in QPBO as unary terms . For every Boolean clause we construct a triplewise term
(8) 
This term takes the large value iff is not satisfied and otherwise. Further, the Boolean clause can be represented uniquely as a multilinear cubic polynomial. For example, a clause can be represented as
(9) 
Then we obtain similar representation with a single third order term and a second order multilinear polynomial for :
(10) 
where , is set to 1 if is empty, , and . We now apply the quadratization techniques [24] to . After introducing an auxiliary variable with , we observe the following identities:
(11)  
(12) 
In either case, substituting the cubic term in with the expression inside the min operator, we can have a unified quadratic form
(13) 
where and is set to 1 if is empty. In both cases, the quadratic form takes the same optimal values as its cubic counterpart given the optimal assignment, i.e.,
(14) 
but the transformation expands the original range of the cubic term from to and from to respectively for and . Therefore, the cost of the constructed instance of QPBO is bounded in the absolute value by and the number of added variables is exactly the number of clauses in . Clearly, this construction can be computed in polynomial time. Note that when approximation is used, this transformation is no longer exact (), as the optimality of the auxiliary variable cannot be guaranteed. However, it can be verified that under all possible assignments (ignoring the min operator) in either case, , which is the key for the reduction to be an approximation preserving (AP) one.
The construction above defines a mapping from any instance of W3SATtriv () to an instance of QPBO (). The mapping from feasible solutions of () to that of is defined as follows: if , then let the mapped solution be the all true assignment, otherwise let the mapped solution be .
Now, we need to show that together with a constant is an APreduction. Let , , and to be short for , , , and respectively, where indicates the optimal solution. First, note that is always feasible for W3SATtriv: either it satisfies or and therefore is the alltrue assignment. In the first case, since every quadratic term is nonnegative, we have
(15)  
(16) 
In the second case, by construction
(17) 
Therefore, no matter which case .
Now for the optimal solution, if is satisfiable, then by construction . Recall from Definition 2.0, . For any instance , for any rational , and for any , if
(18) 
then
(19)  
(20) 
If is not satisfiable, and . Thus, for any instance , for any rational , and for any ,
(21) 
Therefore is an APreduction. Since W3SATtriv is expAPXcomplete and QPBO is in expAPX, we prove that QPBO is expAPXcomplete.
See 3.1
Proof.
We create an APreduction from QPBO to label energy minimization by setting up the unary and pairwise terms to discourage a labeling with the additional labels.
Denote QPBO as and label energy minimization as . Given an instance , let be a large number such that all for all , . For example, we can let
(22) 
We define the forward mapping from any to as follows:

, for , and ;

, for , and ;

, for , and ;

if either or for .
This setup has two properties:

if and only if the labeling includes labels that are not in ;

= , for any and .
Then we define the reverse mapping from any to to be

= , if ;

be any fixed feasible solution (e.g., all nodes are labeled as the first label), if .
Observe that in both cases, . For any instance , for any rational , and for any , if
(23) 
then
(24)  
(25) 
Therefore is an APreduction. As QPBO is expAPXcomplete and all energy minimization problems are in expAPX, we conclude that label energy minimization is expAPXcomplete for .
The above construction also formally shows that the energy minimization problem can only become harder when having a larger labeling space, irrespective of the graph structure and the interaction type.
0.a.2 Planar Case
See 4.1
Proof.
We create an APreduction from 3label energy minimization to planar 3label energy minimization by introducing polynomially many auxiliary nodes and edges.
Denote 3label energy minimization as and planar 3label energy minimization as . Given an instance , we compute a large number as in Equation Eq. 22 in the proof for Corollary 3.1.
The gadgetbased reduction presented in Section 4, defines a forward mapping from any to . Let be the nodes added during the reduction, then . The two gadgets Split and UncrossCopy are used 4 times each to replace an edge crossing (point of intersection not at end points) with a planar representation (Figure 4), introducing 22 auxiliary nodes. Since the gadgets can be drawn arbitrarily small so that they are not intersecting with any other edges, we can repeatedly replace all edge crossings in with this representation. There can be up to edge crossings, and we have = . Given that the reduction adds only a polynomial number of auxiliary nodes, the forward mapping can be computed by a polynomial time algorithm.
This setup has two properties:

if and only if the labeling is the same as the partial labeling in restricting to nodes in in .

= , for any and .
Then we define the reverse mapping from any to to be

= , if ;

be any fixed feasible solution (e.g., all nodes are labeled as the first label), if .
Observe that in both cases, . For any instance , for any rational , and for any , if
(26) 
then
(27)  
(28) 
Therefore is an APreduction. As 3label energy minimization is expAPXcomplete (Corollary 3.1) and all energy minimization problems are in expAPX, we conclude that planar 3label energy minimization is expAPXcomplete.
See 4.1
Proof.
The proof of Corollary 3.1 is graph structure independent. Therefore, the same proof applies here.
Appendix 0.B Relation to Bayesian Networks
There are substantial differences between results for Bayesian networks [2] and our result. Bayesian networks have a probability density function that factors according to a directed acyclic graph, \eg, as . Finding the MAP assignment (same as the most probable estimate (MPE)) in a Bayesian network is related to energy minimization (1) by letting . The product is transformed into the sum and so, \eg, factor corresponds to term .
The inapproximability result of Abdelbar and Hedetniemi [2] holds even when restricting to binary variables and factors of order three. However, [2, Section 6.1] count incoming edges of the network. For a factor , there are two, but the total number of variables it couples is three and therefore such a network does not correspond to QPBO. If one restricts to factors of at most two variables, \eg, , in a Bayesian network, then only treestructured models can be represented, which are easily solvable.
In the other direction, representing pairwise energy (1) as a Bayesian network may require to use factors of order up to composed of conditional probabilities of the form with the number of variables depending on the vertex degrees. It is seen that while the problems in their most general forms are convertible, fixedparameter classes (such as order and graph restrictions) differ significantly. In addition, the approximation ratio for probabilities translates to an absolute approximation (an additive bound) for energies. The next corollary of our main result illustrates this point.
Corollary 0.B.0.
It is NPhard to approximate MAP in the value of probability (2) with any exponential ratio , where is polynomial.
Proof.
Recall that the probability is given by the exponential map of the energy: . Assume for contradiction that there is a polynomial time algorithm that finds solution and a polynomial for such that
(29) 
for all instances of the problem. Taking the logarithm,
(30) 
or,
(31) 
Divide by , which, by definition of NPO is positive, we obtain
(32) 
where we have used that is integer and positive and hence it is greater or equal to 1. Inequality (32) provides a polynomial ratio approximation for energy minimization. Since the latter is expAPXcomplete (Corollary 3.1), this contradicts existence of the polynomial algorithm , unless P = NP.
Note, this corollary provides a stronger inapproximability result for probabilities than was proven in [2].
Remark 0.B.0.
Abdelbar and Hedetniemi [2] have shown also the following interesting facts. For Bayesian networks, the following problems are also APXhard (in the value of probability):

Given the optimal solution, approximate the second best solution;

Given the optimal solution, approximate the optimal solution conditioned on changing the assignment of one variable.
Appendix 0.C Miscellaneous
This result is used in Section 5.2.
Corollary 0.C.0.
An approximation implies an approximation for label energy minimization problems.
Proof.
Observe that an instance of the energy minimization problem Eq. 1 is completely specified by a set of all unary terms and pairwise terms . This defines a natural encoding scheme to describe an instance of an energy minimization problem with binary alphabet . Assume each potential is encoded by digits, the input size
(33) 
For an approximation algorithm, the performance ratio
(34) 
which implies an approximation algorithm.