Solving Constraint Satisfaction Problems throughBelief Propagation-guided decimation

# Solving Constraint Satisfaction Problems through Belief Propagation-guided decimation

Andrea Montanari, Federico Ricci-Tersenghi and Guilhem Semerjian A. Montanari is with Departments of Electrical Engineering and Statistics, Stanford University, montanari@stanford.edu, F. Ricci-Tersenghi is with Dipartimento di Fisica, Università di Roma La Sapienza, G. Semerjian is with Laboratoire de Physique Théorique de l’Ecole Normale, Paris
###### Abstract

Message passing algorithms have proved surprisingly successful in solving hard constraint satisfaction problems on sparse random graphs. In such applications, variables are fixed sequentially to satisfy the constraints. Message passing is run after each step. Its outcome provides an heuristic to make choices at next step. This approach has been referred to as ‘decimation,’ with reference to analogous procedures in statistical physics.

The behavior of decimation procedures is poorly understood. Here we consider a simple randomized decimation algorithm based on belief propagation (BP), and analyze its behavior on random -satisfiability formulae. In particular, we propose a tree model for its analysis and we conjecture that it provides asymptotically exact predictions in the limit of large instances. This conjecture is confirmed by numerical simulations.

## I Introduction

An instance of a constraint satisfaction problem [1] consists of variables and constraints among them. Solving such an instance amounts to finding an assignment of the variables that satisfies all the constraints, or proving that no such assignment exists. A remarkable example in this class is provided by -satisfiability, where variables are binary, , and each constraint requires of the variables to be different from a specific -uple. Explicitly, the -th constraint (clause), is specified by variable indexes , and bits . Clause is satisfied by assignment if and only if .

A constraint satisfaction problem admits a natural factor graph [2] representation, cf. Fig. 1. Given an instance, each variable can be associated to a variable node, and each constraint to a factor node. Edges connect factor node to those variable nodes such that the -th constraint depends in a non-trivial way on variable . For instance, in the case of -satisfiability, clause is connected to variables . If the resulting graph is sparse, fast message passing algorithms can be defined on it.

Although constraint satisfaction problems are generally NP-hard, a large effort has been devoted to the development of efficient heuristics. Recently, considerable progress has been achieved in building efficient ‘incomplete solvers’ [3]. These are algorithms that look for a solution but, if they do not find one, cannot prove that the problem is unsolvable. A particularly interesting class is provided by message passing-guided decimation procedures. These consist in iterating the following steps:

1. Run a message passing algorithm.

2. Use the result to choose a variable index , and a value for the corresponding variable.

3. Replace the constraint satisfaction problem with the one obtained by fixing to .

The iteration may stop for two reasons. In the first case a contradiction is produced: the same variable appears in two constraints whose other arguments have already been fixed, and that are satisfied by distinct values of . If this does not happen, the iteration stops only when all the variables are fixed and a solution is found. Notice that earlier algorithms, such as unit clause propagation (UCP) [4, 5] did not used message passing in step 2, and were not nearly as effective.

Random constraint satisfaction problems are a useful testing ground for new heuristics. For instance, random -satisfiability is the distribution over -SAT formulae defined by picking a formula uniformly at random among all the ones including clauses over variables. Decimation procedures of the type sketched above proved particularly successful in this context. In particular survey propagation-guided decimation [6, 7] outperformed the best previous heuristics based on stochastic local search [3]. More recently belief propagation-guided decimation was shown empirically to have good performances as well [8].

Unfortunately, so far there exists no analysis of message-passing guided decimation. Our understanding almost entirely relies on simulations, even for random instances. Consequently the comparison among different heuristics, as well as the underpinnings of their effectiveness are somewhat unclear. In this paper we define a simple class of randomized message passing-guided decimation algorithms, and present a technique for analyzing them on random instances. The technique is based on the identification of a process on infinite trees that describes the evolution of the decimation algorithm. The tree process is then analyzed through an appropriate generalization of density evolution [14]. Our approach is close in spirit to the one of [9]. While it applies to a large class of random constraint satisfaction problems (including, e.g. coloring of random graphs), for the sake concreteness, we will focus on random -SAT.

We expect the tree process to describe exactly the algorithm behavior in the limit of large instances, . While we could not prove this point, numerical simulations convincingly support this conjecture. Further, non-rigorous predictions based on tree calculations have been repeatedly successful in the analysis of random -satisfiability. This approach goes under the name of ‘cavity method’ in statistical mechanics [6].

The paper is organized as follows. Section II contains some necessary background and notation on random -SAT as well as a synthetic discussion of related work. In Section III we define the decimation procedure that we are going to analyze. We further provide the basic intuition behind the definition of the tree model. The latter is analyzed in Section IV, and the predictions thus derived are compared with numerical simulations in Section V. Finally, some conclusions and suggestions for future work are presented in Section VI. Proofs of several auxiliary lemmas are omitted from this extended abstract and deferred to technical appendices.

## Ii Random k-SAT and message passing: Background and related work

As mentioned above, random -SAT refers to the uniform distribution over -SAT instances with constraints over variables. More explicitly, each constraint is drawn uniformly at random among the possible ones. We are interested here in the limit with fixed.

Consider the factor graph of a random -SAT formula, endowed with the graph-theoretic distance. Namely, the distance of two variable nodes is the length of the shortest path leading from to on . It is well known [10] that, in the large size limit, any finite neighborhood of a random node converges in distribution to a well defined random tree. This observation will be the basis of our tree analysis of the decimation process, and is therefore worth spelling it out in detail. Let be the subgraph induced by all the vertices , such that . Then as , where is the random rooted (factor) tree defined recursively as follows. For , is the graph containing a unique variable node. For any , start by a single variable node (the root) and add clauses, each one including the root and new variables (first generation variables). If , generate an independent copy of for each variable node in the first generation and attach it to them. The values that violate clause are independently chosen in with equal probability. It is easy to see that the limit object is well defined and is an infinite tree with positive probability if .

We let be the largest value of such that random -SAT instances admit with high probability a solution. It is known [11] that . A sharp conjecture on the value of has been put forward in [6] on the basis of statistical physics calculations, implying , , for (respectively) and for large  [12].

Simple heuristics have been analyzed thoroughly [5] and proved to find a solution with probability bounded away from if . Here the proportionality constant depends on the specific heuristic.

To the best of our knowledge, the first application of message passing algorithms to -satisfiability is reported in [13]. In this early study BP was mostly applied in a one-shot fashion (as in iterative decoding of sparse graph codes [14]), without decimation. By this we mean that belief propagation is run, and resulting marginal probabilities are used to guess the values of all variables at once. However the probability of success of the one-shot algorithm is exponentially small: there are isolated constraints, whose variables have non-trivial marginal probabilities, each of them is hence violated with finite probability in the one-shot assignment.

Statistical mechanics methods allowed to derive a very precise picture of the solution set [15, 6, 8]. This inspired a new message passing algorithm dubbed survey propagation [7]. In conjunction with decimation, this algorithm allowed to solve random instances of unprecedentedly large sizes, in difficult regimes of and .

A natural way of introducing belief propagation for -satisfiability is to consider the uniform distribution over solutions (assuming their existence). Let us denote by the set of variable nodes on which the -th constraint effectively depends, for any subset of the variable nodes their partial assignment , and the indicator function of the event ’clause is satisfied.’ The uniform distribution over the solutions can thus be written

 μ(x––)=1Z∏a∈Fwa(x––∂a) . (1)

In [16] it was proved that for , BP computes good approximations of the marginals of , irrespective of its initialization. It is clear from empirical studies [17, 18] that the ‘worst case’ argument used in this estimate (and in other papers on belief propagation [19, 20]) is far too pessimistic.

In Ref. [21] a simple message passing algorithm, warning propagation (see below), was analyzed for a modified (‘planted’) ensemble of random formulae. The algorithm was proved to converge and find solutions for large enough density (see also [22, 23]). Both the ensemble and the algorithm are quite different from the ones treated in this paper.

Further, the definition and analysis of a ‘Maxwell decoder’ in [24, 25], is closely related to the approach in this paper. Let us recall that the Maxwell decoder was a (mostly conceptual) algorithm for implementing maximum likelihood decoding of LDPC codes over the erasure channel. The treatment in [24, 25] applies almost verbatim to a simple constraint satisfaction problem known as XORSAT. The generalization in the present paper is analogous to the one from the erasure to a general binary memoryless symmetric channel.

Finally, let us mention that BP decimation can be an interesting option in engineering applications, as demonstrated empirically in the case of lossy source coding [26, 27].

## Iii A simple decimation procedure

### Iii-a Belief propagation

Let us recall the definition of BP for our specific setup ([2, 28] are general references). BP is a message passing algorithm: at each iteration messages are sent from variable nodes to neighboring clause nodes and vice versa. To describe the message update equations, we need some more notation. As in the case of factor nodes, we shall call the set of factors that depends on the variable . If , say , we denote the value of which does not satisfy the -th clause. For a pair of adjacent variable () and factor () nodes (i.e. ), let us call (resp. ) the set of factor nodes adjacent to , distinct from , that agrees (resp. disagrees) with on the satisfying value of . In formulae, and .

It is convenient to use log-likelihood notations for messages as is done in iterative decoding [14], with two caveats: We introduce a factor to be consistent with physics notation; The message from variable node to factor node corresponds to the log-likelihood for to satisfy/not-satisfy clause (rather than to be ).

Let , denote the messages that are passed at time along the directed edges and , for , and . The update equations read

 h(r+1)i→a = ∑b∈∂+i(a)u(r)b→i−∑b∈∂−i(a)u(r)b→i, (2) u(r)a→i = f({h(r)j→a;j∈∂a∖i}), (3)

where we define the function as

 f(h1,…,hk−1)=−12log{1−1−ϵ2k−1k−1∏i=1(1−tanhhi)}, (4)

with (this parameter is introduced for the discussion in Sec. V).

For , let be the subset of clauses that are satisfied by , and the subset satisfied by . Then the BP estimate for the marginal of under the measure is , where

 ν(r)i(0/1) = 1±tanhh(r)i2 , h(r)i = ∑a∈∂+iu(r)a→i−∑a∈∂−iu(r)a→i. (5)

### Iii-B Unit clause and warning propagation

During the decimation procedure a subset of the variables are fixed to specific values, collectively denoted as . This has some direct implications. By this we mean that for some other variables , , it follows from ‘unit clause propagation’ (UCP) that they take the same value in all of the solutions compatible with the partial assignment . We will say that these variables are directly implied by the condition . Let us recall that unit clause propagation corresponds to the following deduction procedure. For each of the fixed variables , and each of the clauses it belongs to, the value can either satisfy clause , or not. In the first case clause can be eliminated from the factor graph. In the second a smaller clause with one less variable is implied. In both cases variable is removed. It can happen that the size of a clause gets reduced to , through this procedure. In this case the only variable belonging to the clause must take a definite value in order to satisfy it. We say that such a variable is directly implied by the fixed ones. Whenever a variable is directly implied, its value can be substituted in all the clauses it belongs to, thus allowing further reductions.

The process stops for one of two reasons: All the fixed or directly implied variables have been pruned and no unit clause is present in the reduced formula. In this case we refer to all variables that appeared at some point in a unit clause as directly implied variables. Two unit clauses imply different values for the same variable. We will say that a contradiction was revealed in this case: no solution of the formula can verify the condition .

A key element in our analysis is the remark that UCP admits a message passing description. The corresponding algorithm is usually referred to as warning propagation (WP) [29]. The WP messages (to be denoted as , ) take values in . The meaning of (respectively ) is: ‘variable is (resp. is not) directly implied by clause to satisfy it.’ For variable-to-factor messages, the meaning of (respectively ) is: ‘variable is (resp. is not) directly implied, through one of the clauses , not to satisfy clause .’

We want to apply WP to the case in which a part of the variables have been fixed, namely for any . In this case the WP rules read

 h(r+1)i→a = ⎧⎪ ⎪⎨⎪ ⎪⎩Iif ∃b∈∂−i(a) s.t. u(r)b→i=Ior i∈U and x∗i=z(i,a),0 otherwise, (6) u(r)a→i = {I if h(r)j→a=I ∀j∈∂a∖i,0 otherwise. (7)

In the following we shall always assume that WP is initialized with , for each edge . It is then easy to prove that messages are monotone in the iteration number (according to the ordering ). In particular the WP iteration converges in at most iterations. We denote the corresponding fixed point messages, and say that is WP-implied by the fixed variables if there exist such that . Then the equivalence between UCP and WP can be stated in the form below.

###### Lemma 1.

Assume a partial assignment to be given for . Then

1. The fixed point WP messages do not depend on the order of the WP updates (as long as any variable is updated an a priori unlimited number of times).

2. is directly implied iff it is WP-implied.

3. UCP encounters a contradiction iff there exists , , such that .

For the clarity of what follows let us emphasize the terminology of fixed variables (those in ) and of directly implied variables (not in , but implied by though UCP or WP). Finally, we will call frozen variables the union of fixed and directly implied ones, and denote the set of frozen variables by .

### Iii-C Decimation

The BP-guided decimation algorithm is defined by the pseudocode of Table I.

There are still a couple of elements we need to specify. First of all, how the BP equations (2), (3) are modified when a non-empty subset of the variables is fixed. One option would be to eliminate these variables from the factor graph, and reduce the clauses they belong to accordingly. A simpler approach consists in modifying Eq. (2) when . Explicitly, if the chosen value satisfies clause , then we set . If it does not, we set .

Next, let us stress that, while WP is run until convergence, a not-yet defined ‘stopping criterion’ is used for BP. This will be precised in Section V. Here we just say that it includes a maximum iteration number , which is kept of smaller order than .

The algorithm complexity is therefore naively . It requires cycles, each involving: at most BP iterations of complexity and at most WP iterations of complexity . It is easy to reduce the complexity to by updating WP in sequential (instead of parallel) order, as in UCP. Finally, natural choice (corresponding to the assumption that BP converges exponentially fast) is to take , leading to complexity.

In practice WP converges after a small number of iterations, and the BP updates are the most expensive part of the algorithm. This could be reduced further by using the fact that fixing a single variable should produce only a small change in the messages. Ref. [7] uses this argument for a similar algorithm to argue that time is enough.

### Iii-D Intuitive picture

Analyzing the dynamics of BP-decimation seems extremely challenging. The problem is that the procedure is not ‘myopic’ [5], in the sense that the value chosen for variable depends on a large neighborhood of node in the factor graph. By analogy with myopic decimation algorithms one expects the existence of a critical value of the clause density such that the algorithm finds a solution with probability bounded away from for , while it is unsuccessful with high probability for . Notice that, if the algorithm finds a solution with positive probability, restarting it a finite number of times should111A caveat: here we are blurring the distinction between probability with respect to the formula realization and the algorithm realization. yield a solution with probability arbitrarily close to .

We shall argue in favor of this scenario and present an approach to analyze the algorithm evolution for smaller than a spinodal point . More precisely, our analysis allows to compute the asymptotic fraction of ‘directly implied’ variables after any number of iterations. Further, the outcome of this computation provides a strong indication that . Both the analysis, and the conclusion that are confirmed by large scale numerical simulations.

Our argument goes in two steps. In this section we show how to reduce the description of the algorithm to a sequence of ‘static’ problems. The resolution of the latter will be treated in the next section. Both parts rely on some assumptions on the asymptotic behavior of large random -SAT instances, that originate in the statistical mechanics treatment of this problem [6, 8]. We will spell out such assumptions along the way.

As a preliminary remark, notice that the two message passing algorithms play different roles in the BP-decimation procedure of Table I. BP is used to estimate marginals of the uniform measure over solutions, cf. Eq. (1), in the first repetition of the loop. In subsequent repetitions, it is used to compute marginals of the conditional distribution, given the current assignment . These marginals are in turn used to choose the values of variables to be fixed. WP is on the other hand used to check a necessary condition for the current partial assignment to be consistent. Namely it checks if it induces a contradiction on directly implied variables. In fact, it could be replaced by UCP, and, in any case, it does not influence the evolution of the partial assignment .

Let us introduce some notation: is the order in which variables are chosen at step in the algorithm, the set of fixed variables at the beginning of the -th repetition of the loop, and the frozen variables at that time (i.e. the union of and the variables directly implied by ).

We begin the argument by considering an ‘idealized’ version of the algorithm where BP is replaced by a black box, that is able to return the exact marginal distribution of the measure conditioned on the previous choices, namely . Let us point out two simple properties of this idealized algorithm. First, it always finds a solution if the input formula is satisfiable (this will be the case with high probability if we assume ). In fact, assume by contradiction that the algorithm fails. Then, there has been a last time , such that the -SAT instance has at least one solution consistent with the condition , but no solution under the additional constraint for . This cannot happen because it would imply , and if this is the case, we would not have chosen in step of the algorithm.

The second consequence is that the algorithm output configuration is a uniformly random solution. This follows from our assumption since

 P{x––∗|i(⋅)} = n∏t=1νi(t)(x∗i(t))= = n∏t=1μi(t)|U(t−1)(x∗t|x––∗U(t−1))=μ(x––∗).

Therefore, the distribution of the state of the idealized algorithm after any number of decimation steps can be described as follows. Pick a uniformly random solution , and a uniformly random subset of the variable indexes , with . Then fix the variables to take value , and discard the rest of the reference configuration (i.e. the bits for ).

We now put aside the idealized algorithm and consider the effect of fixing the -th variable to . Three cases can in principle arise: was directly implied to be equal to by and a contradiction is generated. We assume that BP is able to detect this direct implication and avoid such a trivial contradiction; was directly implied to by . The set of frozen variables remains the same, , as this step is merely the actuation of a previous logical implication; was not directly implied by . This is the only interesting case that we develop now.

Let us call the set of newly frozen variables after this fixing step. A moment of reflection shows that contains and that it forms a connected subset of in . Consider now the subgraph induced by (i.e. where is the set of factor nodes having at least one adjacent variable in , and is the set of edges between and ). A crucial observation is the following:

###### Lemma 2.

If is a tree, no contradiction can arise during step .

From this lemma, and since the factor graph of a typical random formula is locally tree-like, one is naturally lead to study the size of , i.e. of the cascade of newly implied variables induced by fixing the -th variable. If this size remains bounded as , then will typically be a tree and, consequently, contradictions will arise with vanishingly small probability during one step. If on the other hand the size diverges for infinitely large samples, then will contain loops and opens the possibility for contradictions to appear.

In order to compute the typical size of , we notice that , and consider a of order , namely . If we let denote the fraction of frozen variables when a fraction of variables have been fixed, then under mild regularity conditions we have

 limn→∞E[|Znθ|]=dϕ(θ)dθ. (8)

Of course will be an increasing function of . The argument above implies that, as long as its derivative remain finite for , then the algorithm finds a solution. When the derivative diverges at some point , then the number of direct implications of a single variable diverges as well. The spinodal point is defined as be the smallest value of such that this happens.

The expectation in the definition of is with respect to the choices made by the real BP algorithm in the first steps, including the small mistakes it necessarily makes. Our crucial hypothesis is that the location of does not change (in the limit) if is computed along the execution of the idealized decimation algorithm. In other words we assume that the cumulative effect of BP errors over decimation steps produces only a small bias in the distribution of . For this hypothesis is no longer consistent, as the real BP algorithm fails with high probability.

Under this hypothesis, and recalling the description of the state of the idealized algorithm given above, we can compute as follows. Draw a random formula on variables, a uniformly random ‘reference’ solution , a subset of variable nodes222in the large limit one can equivalently draw by including in it each variable of independently with probability .. Let be the probability that a uniformly random variable node is frozen, i.e. either in or directly implied by . Then . In the next Section this computation will be performed in the random tree model of Sec. II.

## Iv The tree model and its analysis

Let us consider a -satisfiability formula whose factor graph is a finite tree, and the uniform measure over its solutions (which always exist) defined in Eq. (1). It follows from general results [2] that the recursion equations (2,3) have a unique fixed-point, that we shall denote . Further the BP marginals , cf. Eq. (5), are the actual marginals of . Drawing a configuration from the law is most easily done in a recursive, broadcasting fashion. Start from an arbitrary variable node and draw with distribution . Thanks to the Markov property of , conditional on the value of , can be generated independently for each of the branches of the tree rooted at . Namely, for each , one draws from

 μ(x––∂a∖i|xi)=1zwa(xi,x––∂a∖i)∏j∈∂a∖iνj→a(xj) . (9)

Here is a normalization factor and denotes the marginal of the variable in the amputated factor graph where factor node has been removed (this is easily expressed in terms of the message ). Once all variables at distance from have been generated, the process can be iterated to fix variables at distance from , and so on. It is easy to realize that this process indeed samples a solution uniformly at random.

Following the program sketched in the previous Section, we shall study the effect of fixing a subset of the variables to the value they take in one of the solutions. We first state the following lemma.

###### Lemma 3.

Suppose is a subset of the variables of a tree formula, and let be a uniformly random solution. The probability that a variable is directly implied by reads

 νi(0){1−∏a∈∂+i(1−ˆua→i)}+νi(1){1−∏a∈∂−i(1−ˆua→i)} , (10)

where the new messages are solutions of

 ˆhj→a = {1if j∈U1−∏b∈∂−j(a)(1−ˆub→j)otherwise (11) ˆua→l = ∏j∈∂a∖l(1−tanhhj→a2ˆhj→a) . (12)

We consider now a random tree factor graph and a random set of fixed variables .

###### Lemma 4.

Consider a random tree formula obtained from the construction of Section II, and a random subset of its variable nodes defined by letting independently with probability for each . Finally, let be a uniformly random solution of . Then the probability that the root of is frozen (either fixed or directly implied by ) is

 ϕ\footnotesize\rm treeℓ(θ)=Eℓ[(1−tanhh)ˆh] , (13)

where denotes expectation with respect to the distribution of . This is a (vector) random variable defined by recurrence on as

 (h,ˆh)ℓ \lx@stackreld= (l+∑i=1u+i−l−∑i=1u−i,1−ζl−∏i=1(1−ˆu−i)), (14) (u,ˆu)ℓ+1 \lx@stackreld= (f(h1,…,hk−1),k−1∏i=11−tanhhi2ˆhi), (15)

with initial condition with probability 1. In this recursion and are two independent Poisson random variables of parameter , is a random variable equal to (resp. ) with probability (resp. ), the and are independent copies of, respectively, and .

To obtain a numerical estimate of the function we resorted to sampled density evolution (also called ‘population dynamics’ in the statistical physics context [6]), using samples of elements and as a working example, see Fig. 2. For small values of , is smoothly increasing and slightly larger than . Essentially all frozen variables are fixed ones, and very few directly implied variables appear. Moreover the maximal slope of the curve is close to 1, implying that the number of new frozen variables at each step, , remains close to 1. As grows, becomes significantly different from , and the maximal slope encountered in the interval gets larger. At a value the curve acquires a vertical tangent at , signaling the divergence of the size of the graph of newly implied variables. Density evolution gives us , with an associated value of . For the curve has more than one branch, corresponding to the presence of multiple fixed points for . In analogy with [25], we expect the evolution of the algorithm to be described by picking (for each ) the lowest branch of . The resulting curve has a discontinuity at , which is a slowly decreasing function of .

We expect the tree computation to provide the correct prediction for the actual curve (i.e. ) for a large range of the satisfiable regime, including . As a consequence, we expect and BP decimation to be successful up to . Similar tree computations are at the basis of a number of statistical mechanics computations in random -SAT and have been repeatedly confirmed by rigorous studies.

The relation between tree and graph can be formalized in terms of Aldous [30] local weak convergence method. Fix a finite integer and consider the finite neighborhood of radius around an arbitrarily chosen variable node of an uniformly drawn factor graph on variables. Denote by the law of when is a uniformly random solution. We proceed similarly in the random tree ensemble. Draw a random tree with , let its first generations, and the distribution of . Considerations building on the field of statistical mechanics of disordered systems leads to the following hypothesis.

###### Conjecture 1.

There exists a sequence such that for all , i.e. and have the same weak limit. A precise determination of was presented in [8], yielding for, respectively, , and at large .

Local weak limits of combinatorial models on random graphs were recently considered in [31]. For a generalized conjecture in the regime see [32].

A slightly stronger version of this conjecture would imply that . As a consequence (following the discussion in previous section) the tree model would correctly describe the algorithm evolution.

## V Numerical simulations

In order to test the validity of our analysis we performed numerical simulations of the pseudo-code of Table I. Let us give a few further details on its implementation. The BP messages are stored as . Ambiguities in the update rule (3) arises when with and . Because of numerical imprecisions this situation can occur even before a contradiction has been detected by WP; such ambiguities are resolved by recomputing the incoming messages using the regularized version of Eq. (4) with a small positive value of (in practice we used ).

As for the stopping criterion used in step 5, we leave the BP iteration loop if either of the two following criteria is fulfilled: , , i.e. BP has converged to a fixed-point within a given accuracy; A maximal number of iterations fixed beforehand has been reached. In our implementation we took and .

A first numerical check is presented in Fig. 2. The two dashed curves represent the fraction of frozen variables along the execution of the BP guide decimation algorithm, for two formulas of the -sat ensemble, of moderate size (). The first formula had a ratio of constraints per variable . In agreement with the picture obtained from the analytical computation, the algorithm managed to find a solution of the formula (no contradiction encountered) and the measured fraction of frozen variables follows quite accurately the tree model prediction. The second formula was taken in the regime . The algorithm halted because a contradiction was found, after roughly the fraction (computed from the tree model) of variables has been fixed. The portion of the curve before this event exhibits again a rather good agreement between the direct simulation and the model.

Figure 3 shows the probability of success of BP decimation in a neighborhood of for random formulae of size , , . Each data point is obtained by running the algorithm on to formulae. The data strongly suggest that the success probability is bounded away from for , in agreement with our argument.

Finally, in Figure 4 we consider the number of variables fixed by BP decimation before a contradiction is encountered. According to the argument in Section III-D, should concentrate around the location of the discontinuity in . This is in fact the point at which the number of variables directly implied by a fixed one is no longer bounded. The comparison is again encouraging. Notice that for we do not have any prediction, and the estimate of concerns only a small fraction of the runs.

To summarize, our simulations support the claim that, for the success probability is strictly positive and the algorithm evolution follows the tree model. For the main failure mechanism is indeed related to unbounded cascades of directly implied variables, after about steps.

## Vi Conclusions and future directions

Let us conclude by highlighting some features of this work and proposing some directions for future research. It is worth mentioning that, as was also found in [8], random 3-sat has a qualitatively different behavior compared to random -sat with . In particular we did not found any evidence for the existence of a vertical tangent point in the function in the regime we expect to control through the tree computation, namely .

Our analysis suggests that BP guided decimation is successful with positive probability for . Further we argued that this threshold can be computed through a tree model and evaluated via density evolution. Despite these conclusions are based on several assumptions, it is tempting to make a comparison with the best rigorous results on simple decimation algorithms. For the best result was obtained by Frieze and Suen [33] who proved SCB (shortest clause with limited amount of backtracking) to succeed for . This is far from the conjectured threshold of BP decimation that is . For large , an asymptotic expansion suggests that

 α\footnotesize spin(k)=e2kk(1+O(k−1)) , (16)

whereas SCB is known from [33] to reach clause densities of , with as . A rigorous version of our analysis would lead to a constant factor improvement. On the other hand, the quest for an algorithm that provably solves random -SAT in polynomial time beyond , is open.

From a practical point of view the decimation strategy studied in this paper is not the most efficient one. A seemingly slight modification of the pseudo-code of Table I consists in replacing the uniformly random choice of the variable to be fixed, privilegiating the ones with the most strongly biased marginals. The intuition for this choice is that these marginals are the less subject to the ’small errors’ of BP. The numerical results reported in [8] suggest that this modification improves significantly the performances of the decimation algorithm; unfortunately it also makes its analysis much more difficult.

This work was partially supported by EVERGROW, integrated project No. 1935 in the complex systems initiative of the Future and Emerging Technologies directorate of the IST Priority, EU Sixth Framework.

## References

• [1] N. Creignou, S. Khanna and M. Sudan, Complexity classifications of boolean constraint satisfaction problems, SIAM, Philadelphia, 2001
• [2] F. R. Kschischang, B. J. Frey and H-A. Loeliger, “Factor graphs and the sum-product algorithm” (2001), IEEE Trans. Inform. Theory 47, 498-519.
• [3] B. Selman, H. Levesque and D. Mitchell, “A New Method for Solving Hard Satisfiability Problems”, 10th Natl Conf. on Artif. Intell., San Jose, CA, 440-446
• [4] J. Franco, “Results related to threshold phenomena research in satisfiability: lower bounds”, Theoret. Comput. Sci. 265, 147-157 (2001).
• [5] D. Achlioptas and G. B. Sorkin, Proc. of the Annual Symposium on the Foundations of Computer Science, Redondo Beach, CA, November 2000
• [6] M. Mézard, G. Parisi, and R. Zecchina, “Analytic and Algorithmic Solution of Random Satisfiability Problems”, Science 297 (2002), 812-815.
• [7] M. Mézard and R. Zecchina, “The random -satisfiability problem: from an analytic solution to an efficient algorithm”, Phys. Rev. E 66 (2002), 056126.
• [8] F. Krzakala, A. Montanari, F. Ricci-Tersenghi, G. Semerjian and L. Zdeborova, “Gibbs States and the Set of Solutions of Random Constraint Satisfaction Problems,” Proc. Natl. Acad. Sci. 104 (2007) 10318-10323.
• [9] M. Luby, M. Mitzenmacher and A. Shokrollahi, “Analysis of Random Processes via And-Or Tree Evaluation” Proc. of the Symposium on Discrete Algorithms, San Francisco, January 2008.
• [10] S. Janson, T. Luczak and A. Ruciński”, Random graphs, John Wiley, New York, 2000
• [11] D. Achlioptas and Y. Peres, “The threshold for random -SAT is ”, Journal of the AMS, 17 (2004), 947-973.
• [12] S. Mertens, M. Mézard and R. Zecchina, “Threshold values of random K-SAT from the cavity method”, Random Struct. Alg. 28, 340-373 (2006).
• [13] S. J. Pumphrey, “Solving the Satisfiability Problem Using Message Passing Techniques,” Cambridge Physics Project Report.
• [14] T. Richardson and R. Urbanke, Modern Coding Theory, draft available at http://lthcwww.epfl.ch/mct/index.php
• [15] G. Biroli, R. Monasson and M. Weigt, “A variational description of the ground state structure in random satisfiability problems”, Eur. Phys. J. B 14, 551 (2000).
• [16] A. Montanari and D. Shah, “Counting good truth assignments of random k-SAT formulae,” Proc. of the Symposium on Discrete Algorithms, New Orleans, January 2007.
• [17] E. Aurell, U. Gordon and S. Kirkpatrick, Proc. of Neural Information Processing Symposium, Vancouver, 2004.
• [18] E. N. Maneva, E. Mossel and M. J. Wainwright, Proc. of the Symposium on Discrete Algorithms, Vancouver, January 2005.
• [19] S. Tatikonda and M. Jordan, “Loopy Belief Propagation and Gibbs Measures”, Proc. Uncertainty in Artificial Intell. 18 (2002) 493-500.
• [20] D. Gamarnik and A. Bandyopadhyay, Proc. of the Symposium on Discrete Algorithms, Miami, (2006).
• [21] U. Feige, E. Mossel and D. Vilenchik, “Complete convergence of message passing algorithms for some satisfiability problems” Proc. RANDOM, Barcelona, (2006).
• [22] A. Coja-Oghlan, M. Krivelevich and D. Vilenchik, “Why almost all -CNF formulas are easy”, Proceedings of the 13th International Conference on Analysis of Algorithms, to appear, (2007).
• [23] F. Altarelli, R. Monasson and F. Zamponi, “Can rare SAT formulae be easily recognized? On the efficiency of message-passing algorithms for K-SAT at large clause-to-variable ratios”, J. Phys. A: Math. Theor. 40, 867-886 (2007).
• [24] C. Measson, A. Montanari, T. Richardson and R. Urbanke, “Life Above Threshold: From List Decoding to Area Theorem and MSE,” IEEE Inform. Theory Workshop, San Antonio, October 2004
• [25] C. Measson, A. Montanari and R. Urbanke, “Maxwell Construction: The Hidden Bridge between Iterative and Maximum a Posteriori Decoding,” IEEE Trans. Inform. Theory, to be published.
• [26] E. Maneva and M. Wainwright, “Lossy source encoding via message-passing and decimation over generalized codewords of LDGM codes,” IEEE Inform. Theory Symposium, Adelaide, September 2004
• [27] S. Ciliberti, M. Mézard and R. Zecchina, “Lossy data compression with random gates,” Phys. Rev. Lett. 95, 038701 (2005)
• [28] J. Pearl, “Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference”, San Francisco, CA: Morgan Kaufmann, 1988.
• [29] A. Braunstein, M. Mézard and R. Zecchina, “Survey propagation: an algorithm for satisfiability”, Random Structures and Algorithms 27, 201-226 (2005).
• [30] D. Aldous and J.M. Steele, “The objective method”, in Probability on Discrete Structures, H. Kesten ed., 1, Springer (2003).
• [31] A. Gerschenfeld and A. Montanari, Proc. of the Annual Symposium on the Foundations of Computer Science, Providence, RI, October 2007
• [32] G. Semerjian, “On the freezing of variables in random constraint satisfaction problems”, arXiv/0705.2147.
• [33] A. Frieze and S. Suen, “Analysis of Two Simple Heuristics on a Random Instance of k-SAT”, Journal of Algorithms 20, 312-355 (1996).
###### Proof of Lemma 1.

The statement is completely analogous to the equivalence between message passing and peeling versions of erasure decoding for LDPC codes [14]. Since the proof follows the same lines as well, we will limit ourselves to sketch its main points.

1) Let and be two fixed points of WP. Then is a fixed point as well. It follows that the ‘minimal’ fixed point is well defined and that it coincides with the limit of irrespective of the order of WP updates.

2) Consider the ordering according to which variables are declared as directly implied within UCP. For each there is at least one unit clause involving only variable before this was declared. Call this . Then use the same update order for WP, namely update, in sequence message , and all the messages for . It is immediate to show that this leads to a fixed point, and the resulting WP-implied variables coincide with the directly implied variables. The proof is completed by using point .

3) Consider the same ordering of variables used in point 2 above. If there exists , , as in the statement, then UCP must have reduced both clauses and to a unit clause involving and requiring it to take different values. Viceversa if UCP produces such a situation, in the WP updates after some time . ∎

###### Proof of Lemma 2.

The same statement has been proved for the Maxwell decoder [25]. We therefore briefly recall the basic ideas used in that case.

First of all the only WP messages changing from step to step (call these the ‘new’ messages) are the ones on the edges of the tree , and directed outwards. As a consequence, no contradiction can arise because of two contradicting new messages, because no variable node has two incoming new messages.

There could be, in line of principle, a contradiction between a new and an old message. The crucial observation is that indeed any factor node in has at most two adjacent variable nodes in (because otherwise if could not ‘transmit’ an implication). If a variable node already receives some message at time from clause , then it cannot receive any new message at time from a different clause . This because the message must already be , and therefore clause is already effectively ‘reduced’.

An alternative argument consists in considering the equivalent UCP representation. If is a tree, then no variable appears twice in a unit clause, and therefore no contradiction arises. ∎

###### Proof of Lemma 3.

Since we are dealing with a tree graph, equations (11,12) admit a unique solution, determined from the boundary condition (resp. ) if is a leaf in (resp. a leaf outside of ). The newly introduced messages have the following interpretation. Imagine running WP, cf. Eqs.  (6), (7) to find which variables are directly implied by . Then is the probability that when is drawn conditional on satisfying . Further, is the probability that when is drawn conditional on not satisfying clause .

Now, suppose has been fixed to drawn according to its marginal (hence the two terms in Eq. (10)) and a configuration has been generated conditional on , through the broadcast construction. Then the configuration of the variables in is retained, , and the rest of is discarded. The status (directly implied or not) of is read off from the values of the messages it receives. It is easy to convince oneself that cannot be implied to take the value opposite to the one it took at the beginning of the broadcasting: by definition is compatible with it. Equation (10) follows by computing the probability that at least one of the messages is equal to among the ones from clauses that are satisfied by .

Equation (11) is derived by applying the same argument to the branch of the tree rooted at and not including factor node . Finally, to derive Eq. (12) notice that, in order for variable to be directly implied to satisfy clause , each of the variables must be implied by the corresponding subtree not to satisfy . From the above remark, this can happen only if none of the satisfies . The probability of this event is easily found from (9) to be

 ∏j∈∂a∖i1−tanhhj→a2 . (17)

###### Proof of Lemma 4.

Denote by the root of . Conditional on the realization of the tree and of the set , the probability of a direct implication of the root is obtained by solving (2), (3), (11), (12) for the edges directed towards the root, which leads to couples of messages and along the edges of . Since and are random these couples of messages are random variables as well.

We claim that for , the messages sent to the root of by the adjacent constraint nodes are distributed as . Similarly for , has the distribution of the messages sent from the first generation variables to their ancestor constraint node in a random . This claim is a direct consequence of Eqs. (2), (3), (11), (12) and of the definition of and . The random variables have, for instance, the distribution of the cardinalities of for an arbitrary edge of the random tree, as and unsatisfying values of the variables are chosen independently with equal probability.

Finally the expression of is obtained from (10) by noting that the cardinalities of for the root of are distributed as the ones of and using the global symmetry between and , which implies that on average the two terms of (10) yield the same contribution. Note that the dependence on of arises through the distribution of , the bias of the coin used in (14) being . ∎

###### Details on the population dynamics algorithm.

The numerical procedure we followed in order to determine amounts to approximating the distribution of the random variable by the empirical distribution of a large sample of couples . A sample is then generated according to Eq. (14): for each one draws two Poisson random variables and , indexes uniformly in , and a biased coin . The -th element of the sample is thus computed as

 (hj,ˆhj)=(l+∑i=1uj+i−l−∑i=1uj−i,1−ζl−∏i=1(1−ˆuj−i)) .

Subsequently the sample is updated from by a similar interpretation of Eq. (15). After iterations of these two steps, starting from the initial configuration