What’s hard about Boolean Functional Synthesis?
Abstract
Given a relational specification between Boolean inputs and outputs, the goal of Boolean functional synthesis is to synthesize each output as a function of the inputs such that the specification is met. In this paper, we first show that unless some hard conjectures in complexity theory are falsified, Boolean functional synthesis must generate large Skolem functions in the worstcase. Given this inherent hardness, what does one do to solve the problem? We present a twophase algorithm, where the first phase is efficient both in terms of time and size of synthesized functions, and solves a large fraction of benchmarks. To explain this surprisingly good performance, we provide a sufficient condition under which the first phase must produce correct answers. When this condition fails, the second phase builds upon the result of the first phase, possibly requiring exponential time and generating exponentialsized functions in the worstcase. Detailed experimental evaluation shows our algorithm to perform better than other techniques for a large number of benchmarks.
Keywords:
Skolem functions, synthesis, SAT solvers, CEGAR based approach
1 Introduction
The algorithmic synthesis of Boolean functions satisfying relational specifications has long been of interest to logicians and computer scientists. Informally, given a Boolean relation between input and outupt variables denoting the specification, our goal is to synthesize each output as a function of the inputs such that the relational specification is satisfied. Such functions have also been called Skolem functions in the literature [22, 29]. Boole [7] and Lowenheim [27] studied variants of this problem in the context of finding most general unifiers. While these studies are theoretically elegant, implementations of the underlying techniques have been found to scale poorly beyond small problem instances [28]. More recently, synthesis of Boolean functions has found important applications in a wide range of contexts including reactive strategy synthesis [3, 18, 41], certified QBFSAT solving [20, 34, 6, 31], automated program synthesis [38, 36], circuit repair and debugging [21], disjunctive decomposition of symbolic transition relations [40] and the like. This has spurred recent interest in developing practically efficient Boolean function synthesis algorithms. The resulting new generation of tools [29, 22, 2, 16, 39, 34, 33] have enabled synthesis of Boolean functions from much larger and more complex relational specifications than those that could be handled by earlier techniques, viz. [19, 20, 28].
In this paper, we reexamine the Boolean functional synthesis problem from both theoretical and practical perspectives. Our investigation shows that unless some hard conjectures in complexity theory are falsified, Boolean functional synthesis must necessarily generate superpolynomial or even exponentialsized Skolem functions, thereby requiring superpolynomial or exponential time, in the worstcase. Therefore, it is unlikely that an efficient algorithm exists for solving all instances of Boolean functional synthesis. There are two ways to address this hardness in practice: (i) design algorithms that are provably efficient but may give “approximate” Skolem functions that are correct only on a fraction of all possible input assignments, or (ii) design a phased algorithm, wherein the initial phase(s) is/are provably efficient and solve a subset of problem instances, and subsequent phase(s) have worstcase exponential behaviour and solve all remaining problem instances. In this paper, we combine the two approaches while giving heavy emphasis on efficient instances. We also provide a sufficient condition for our algorithm to be efficient, which indeed is borne out by our experiments.
The primary contributions of this paper can be summarized as follows.

We start by showing that unless = , there exist problem instances where Boolean functional synthesis must take superpolynomial time. Also, unless the Polynomial Hierarchy collapses to the second level, there must exist problem instances where Boolean functional synthesis must generate super polynomial sized Skolem functions. Moreover, if the nonuniform exponential time hypothesis [13] holds, there exist problem instances where Boolean functional synthesis must generate exponential sized Skolem functions, thereby also requiring at least exponential time.

We present a new twophase algorithm for Boolean functional synthesis.

Phase 1 of our algorithm generates candidate Skolem functions of size polynomial in the input specification. This phase makes polynomially many calls to an oracle (SAT solver in practice). Hence it directly benefits from the progess made by the SAT solving community, and is efficient in practice. Our experiments indicate that Phase 1 suffices to solve a large majority of publicly available benchmarks.

However, there are indeed cases where the first phase is not enough (our theoretical results imply that such cases likely exist). In such cases, the first phase provides good candidate Skolem functions as starting points for the second phase. Phase 2 of our algorithm starts from these candidate Skolem functions, and uses a CEGARbased approach to produce correct Skolem functions whose size may indeed be exponential in the input specification.


We analyze the surprisingly good performance of the first phase (especially in light of the theoretical hardness results) and show a sufficient condition on the structure of the input representation that guarantees correctness of the first phase. Interestingly, popular representations like ROBDDs [10] give rise to input structures that satisfy this condition. The goodness of Skolem functions generated in this phase of the algorithm can also be quantified with high confidence by invoking an approximate model counter [12], whose complexity lies in .

We conduct an extensive set of experiments over a variety of benchmarks, and show that our algorithm performs favourably visavis stateoftheart algorithms for Boolean functional synthesis.
Related work
The literature contains several early theoretical studies on variants of Boolean functional synthesis [7, 27, 15, 8, 30, 5]. More recently, researchers have tried to build practically efficient synthesis tools that scale to medium or large problem instances. In [29], Skolem functions for are extracted from a proof of validity of . Unfortunately, this doesn’t work when is not valid, despite this class of problems being important, as discussed in [16, 2]. Inspired by the spectacular effectiveness of CDCLbased SAT solvers, an incremental determinization technique for Skolem function synthesis was proposed in [33]. In [19, 40], a synthesis approach based on iterated compositions was proposed. Unfortunately, as has been noted in [22, 16], this does not scale to large benchmarks. A recent work [16] adapts the compositionbased approach to work with ROBDDs. For factored specifications, ideas from symbolic model checking using implicitly conjoined ROBDDs have been used to enhance the scalability of the technique further in [39]. In the genre of CEGARbased techniques, [22] showed how CEGAR can be used to synthesize Skolem functions from factored specifications. Subsequently, a compositional and parallel technique for Skolem function synthesis from arbitrary specifications represented using AIGs was presented in [2]. The second phase of our algorithm builds on some of this work. In addition to the above techniques, templatebased [38] or sketchbased [37] approaches have been found to be effective for synthesis when we have information about the set of candidate solutions. A framework for functional synthesis that reasons about some unbounded domains such as integer arithmetic, was proposed in [25].
2 Notations and Problem Statement
A Boolean formula on variables is a mapping . The set of variables is called the support of the formula, and denoted . A literal is either a variable or its complement. We use (resp. ) to denote the positive (resp. negative) cofactor of with respect to . A satisfying assignment or model of is a mapping of variables in to such that evaluates to under this assignment. If is a model of , we write and use to denote the value assigned to by . Let be a sequence of variables in . We use to denote the projection of on , i.e. the sequence .
A Boolean formula is in negation normal form (NNF) if (i) the only operators used in the formula are conjunction (), disjunction () and negation (), and (ii) negation is applied only to variables. Every Boolean formula can be converted to a semantically equivalent formula in NNF. We assume an NNF formula is represented by a rooted directed acyclic graph (DAG), where nodes are labeled by and , and leaves are labeled by literals. In this paper, we use AIGs [24] as the initial representation of specifications. Given an AIG with nodes, an equivalent NNF formula of size can be constructed in time. We use to denote the number of nodes in a DAG represention of .
Let be the subformula represented by an internal node (labeled by or ) in a DAG representation of an NNF formula. We use to denote the set of literals labeling leaves that have a path to the node representing in the AIG. A formula is said to be in weak decomposable NNF, or wDNNF, if it is in NNF and if for every labeled internal node in the AIG, the following holds: let be the subformula represented by the internal node. Then, there is no literal and distinct indices such that and . Note that wDNNF is a weaker structural requirement on the NNF representation visavis the wellstudied representation, which has elegant properties [14]. Specifically, every formula is also a wDNNF formula.
We say a literal is pure in iff the NNF representation of has a leaf labeled , but no leaf labeled . is said to be positive unate in iff . Similarly, is said to be negative unate in iff . Finally, is unate in if is either positive unate or negative unate in . A function that is not unate in is said to be binate in .
We also use to denote a sequence of Boolean outputs, and to denote a sequence of Boolean inputs. The Boolean functional synthesis problem, henceforth denoted , asks: given a Boolean formula specifying a relation between inputs and outputs , determine functions such that holds whenever holds. Thus, must be rendered valid. The function is called a Skolem function for in , and is called a Skolem function vector for in .
For , let denote the subsequence and let denote . It has been argued in [22, 16, 2, 19] that given a relational specification , the problem can be solved by first ordering the outputs, say as , and then synthesizing a function for each such that . Once all such are obtained, one can substitute through for through respectively, in to obtain a Skolem function for as a function of only . We adopt this approach, and therefore focus on obtaining in terms of and . Furthermore, we know from [22, 19] that a function is a Skolem function for iff it satisfies , where , and . When is clear from the context, we often omit it and write and . It is easy to see that both and serve as Skolem functions for in .
3 Complexitytheoretical limits
In this section, we investigate the computational complexity of . It is easy to see that can be solved in EXPTIME. Indeed a naive solution would be to enumerate all possible values of inputs and invoke a SAT solver to find values of corresponding to each valuation of that makes true. This requires worstcase time exponential in the number of inputs and outputs, and may produce an exponentialsized circuit. Given this one can ask if we can develop a better algorithm that works faster and synthesizes “small” Skolem functions in all cases? Our first result shows that existence of such small Skolem functions would violate hard complexitytheoretic conjectures.
Theorem 3.1

Unless , there exist problem instances where any algorithm for must take superpolynomial time.

Unless , there exist problem instances where must generate Skolem functions of size superpolynomial in the input size.

Unless the nonuniform exponentialtime hypothesis (or ) fails, there exist problem instances where any algorithm for must generate Skolem functions of size exponential in the input size.
The assumption in the first statement implies that the Polynomial Hierarchy () collapses completely (to level 1), while the second implies that collapses to level 2. A consequence of the third statement is that, under this hypothesis, there must exist an instance of for which any algorithm must take EXPTIME time.
The exponentialtime hypothesis ETH and its strengthened version, the nonuniform exponentialtime hypothesis are unproven computational hardness assumptions (see [17],[13]), which have been used to show that several classical decision, functional and parametrized NPcomplete problems (such as clique) are unlikely to have subexponential algorithms. states that there is no family of algorithms (one for each family of inputs of size ) that can solve 3SAT in subexponential time. In [13] it is shown that if holds, then Clique, the parametrized clique problem, cannot be solved in subexponential time, i.e., for all , and sufficiently large fixed , determining whether a graph has a clique of size cannot be done in .
Proof
We describe a reduction from Clique to . Given an undirected graph on vertices and a number (encoded in binary), we want to check if has a clique of size . We encode the graph as follows: each vertex is identified by a unique number in , and for every , we introduce an input variable that is set to iff . We call the resulting vector of input variables . We also have additional input variables , which represent the binary encoding of (). Finally, we introduce output variables for each , whose values determine which vertices are present in the clique. Let denote the vector of variables.
Given inputs , and outputs , our specification is represented by a circuit over that verifies whether the vertices encoded by indeed form a clique of the graph . The circuit is constructed as follows:

For every such that , we construct a subcircuit implementing . The outputs of all such subcircuits are conjoined to give an intermediate output, say . Clearly, all the subcircuits taken together have size .

We have a tree of binary adders implementing . Let the bit output of the adder be denoted . The size of this adder is clearly .

We have an equality checker that checks if . Clearly, this subcircuit has size . Let the output of this equality checker be called .

The output of the specification circuit is .
Given an instance of Clique, we now consider the specification as constructed above and feed it as input to any algorithm for solving . Let be the Skolem function vector output by . For each , we now feed to the input of the circuit . This effectively constructs a circuit for . It is easy to see from the definition of Skolem functions that for every valuation of , the function evaluates to iff the graph encoded by contains a clique of size .
Using this reduction, we can complete the proofs of our statements:

If the circuits for the Skolem functions are superpolynomial size, then of course any algorithm generating must take superpolynomial time. On the other hand, if the circuits for the Skolem functions are always polysized, then is polynomialsized, and evaluating it takes time that is polynomial in the input size. Thus, if is a polynomialtime algorithm, we also get an algorithm for solving Clique in polynomial time, which implies that .

If the circuits for the Skolem functions produced by algorithm are always polynomialsized, then is polynomialsized. Thus,with polynomialsized circuits we are able to solve Clique. Recall that problems that can be solved using polynomialsized circuits are said to be in the class (equivalently called ). But since Clique is an complete problem, we obtain that . By the KarpLipton Theorem [23], this implies that , which implies that collapses to level 2.

If the circuits for the Skolem functions are subexponential sized in the input , then is also subexponential sized and can be evaluated in subexponential time. It then follows that we can solve any instance pClique of input length in subexponential time – a violation of . Note that since our circuits can change for different input lengths, we may have different algorithms for different . Hence we have to appeal to the nonuniform variant of ETH.∎
Theorem 3.1 implies that efficient algorithms for are unlikely. We therefore propose a twophase algorithm to solve in practice. The first phase runs in polynomial time relative to an oracle and generates polynomialsized “approximate” Skolem functions. We show that under certain structural restrictions on the NNF representation of , the first phase always returns exact Skolem functions. However, these structural restrictions may not always be met. An oracle can be used to check if the functions computed by the first phase are indeed exact Skolem functions. In case they aren’t, we proceed to the second phase of our algorithm that runs in worstcase exponential time. Below, we discuss the first phase in detail. The second phase is an adaptation of an existing CEGARbased technique and is described briefly later.
4 Phase 1: Efficient polynomialsized synthesis
An easy consequence of the definition of unateness is the following.
Proposition 1
If is positive (resp. negative) unate in , then (resp. ) is a correct Skolem function for .
Proof
Recall that is positive unate in means that . We start by observing that . Conversely, . Hence, we conclude that is indeed a correct Skolem function for in . The proof for negative unateness follows on the same lines.∎
The above result gives us a way to identify outputs for which a Skolem function can be easily computed. Note that if (resp. ) is a pure literal in , then is positive (resp. negative) unate in . However, the converse is not necessarily true. In general, a semantic check is necessary for unateness. In fact, it follows from the definition of unateness that is positive (resp. negative) unate in , iff the formula (resp. ) defined below is unsatisfiable.
(1)  
(2) 
Note that each such check involves a single invocation of an NPoracle, and a variant of this method is described in [4].
If is binate in an output , Proposition 1 doesn’t help in synthesizing . Towards synthesizing Skolem functions for such outputs, recall the definitions of and from Section 2. Clearly, if we can compute these functions, we can solve . While computing and exactly for all is unlikely to be efficient in general (in light of Theorem 3.1), we show that polynomialsized “good” approximations of and can be computed efficiently. As our experiments show, these approximations are good enough to solve for several benchmarks. Further, with an access to an oracle, we can also check when these approximations are indeed good enough.
Given a relational specification , we use to denote the formula obtained by first converting to NNF, and then replacing every occurrence of in the NNF formula with a fresh variable . As an example, suppose . Then . Then, we have
Proposition 2

is positive unate in both and .

Let denote . Then .
For every , we can split into two parts, and , and represent as . We use these representations of interchangeably, depending on the context. For , let (resp. ) denote a vector of ’s (resp. ’s). For notational convenience, we use to denote in the subsequent discussion. The following is an easy consequence of Proposition 2.
Proposition 3
For every , the following holds:
Proposition 3 allows us to bound and as follows.
Lemma 1
For every , we have:
In the remainder of the paper, we only use underapproximations of and , and use and respectively, to denote them. Recall from Section 2 that both and suffice as Skolem functions for . Therefore, we propose to use either or (depending on which has a smaller AIG) obtained from Lemma 1 as our approximation of . Specifically,
(3) 
Example 1
Consider the specification , expressed in NNF as . As noted in [33], this is a difficult example for CEGARbased QBF solvers, when is large.
From Eqn 3, , and . With as the choice of , we obtain . Clearly, . On reversesubstituting, we get . Continuing in this way, we get for all . The same result is obtained regardless of whether we choose or for each . Thus, our approximation is good enough to solve this problem. In fact, it can be shown that and for all in this example. ∎
Note that the approximations of Skolem functions, as given in Eqn (3), are efficiently computable for all , as they involve evaluating with a subset of inputs set to constants. This takes no more than time and space. As illustrated by Example 1, these approximations also often suffice to solve . The following theorem partially explains this.
Theorem 4.1

For , suppose the following holds:
Then .

If is in wDNNF, then and for every .
Proof
To prove part (a), we use induction on . The base case corresponds to . Recall that by definition. Proposition 3 already asserts that . Therefore, if the condition in Theorem 4.1(a) holds for , we then have , which in turn is equivalent to . This proves the base case.
Let us now assume (inductive hypothesis) that the statement of Theorem 4.1(a) holds for . We prove below that the same statement holds for as well. Clearly, . By the inductive hypothesis, this is equivalent to . By definition of existential quantification, this is equivalent to . From the condition in Theorem 4.1(a), we also have
The implication in the reverse direction follows from Proposition 2(a). Thus we have a biimplication above, which we have already seen is equivalent to . This proves the inductive case.
To prove part (b), we first show that if is in wDNNF, then the condition in Theorem 4.1(a) must hold for all . Theorem 4.1(b) then follows from the definitions of and (see Section 2), from the statement of Theorem 4.1(a) and from the definitions of and (see Eqn 3).
For , let denote . To prove by contradiction, suppose is in wDNNF but there exists such that is satisfiable. Let , and be a satisfying assignment of . We now consider the simplified circuit obtained by substituting for as well as for , for , for and for in the AIG for . This simplification replaces the output of every internal node with a constant ( or ), if the node evaluates to a constant under the above assignment. Note that the resulting circuit can have only and as its inputs. Furthermore, since the assignment satisfies , it follows that the simplified circuit evaluates to if both and are set to , and it evaluates to if any one of or is set to . This can only happen if there is a node labeled in the AIG representing with a path leading from the leaf labeled , and another path leading from the leaf labeled . This is a contradiction, since is in wDNNF. Therefore, there is no such that the condition of Theorem 4.1(a) is violated.∎
In general, the candidate Skolem functions generated from the approximations discussed above may not always be correct. Indeed, the conditions discussed above are only sufficient, but not necessary, for the approximations to be exact. Hence, we need a separate check to see if our candidate Skolem functions are correct. To do this, we use an error formula , as described in [22], and check its satisfiability. The correctness of this check depends on the following result from [22].
Theorem 4.2 ([22])
is unsatisfiable iff is a correct Skolem function vector.
We now combine all the above ingredients to come up with algorithm bfss (for Blazingly Fast Skolem Synthesis), as shown in Algorithm 1. The algorithm can be divided into three parts. In the first part (lines 211), unateness is checked. This is done in two ways: (i) we identify pure literals in by simply examining the labels of leaves in the DAG representation of in NNF, and (ii) we check the satisfiability of the formulas and , as defined in Eqn 1 and Eqn 2. This requires invoking a SAT solver in the worstcase, and is repeated at most times until there are no more unate variables. Hence this requires calls to a SAT solver. Once we have done this, by Proposition 1, the constants or (for positive or negative unate variables respectively) are correct Skolem functions for these variables.
In the second part, we fix an ordering of the remaining output variables according to an experimentally sound heuristic, as described in Section 6, and compute candidate Skolem functions for these variables according to Equation 3. We then check the satisfiability of the error formula to determine if the candidate Skolem functions are indeed correct. If the error formula is found to be unsatisfiable, we know from Theorem 4.2 that we have the correct Skolem functions, which can therefore be output. This concludes phase 1 of algorithm bfss. If the error formula is found to be satisfiable, we move to phase 2 of algorithm bfss – an adaptation of the CEGARbased technique described in [22], and discussed briefly in Section 5. It is not difficult to see that the running time of phase 1 i͡s polynomial in the size of the input, relative to an oracle (SAT solver in practice). This also implies that the Skolem functions generated can be of at most polynomial size. Finally, from Theorem 4.1(b) we also obtain that if is in wDNNF, Skolem functions generated in phase 1 are correct. From the above reasoning, we obtain the following properties of phase 1 of bfss:
Theorem 4.3

For all unate variables, phase 1 of bfss computes correct Skolem functions.

If is in wDNNF, phase 1 of bfss computes all Skolem functions correctly.

The running time of phase 1 of bfss is polynomial in input size, relative to an oracle. Specifically, the algorithm makes calls to an oracle.

The candidate Skolem functions output by phase 1 of bfss have size at most polynomial in the size of the input.
Discussion:
We make two crucial and related observations. First, by our hardness results in Section 3, we know that the above algorithm cannot solve for all inputs, unless some wellregarded complexitytheoretic conjectures fail. As a result, we must go to phase 2 on at least some inputs. Surprisingly, our experiments show that this is not necessary in the majority of benchmarks.
The second observation tries to understand why phase 1 works in most cases in practice. While a conclusive explanation isn’t easy, we believe Theorem 4.1 explains the success of phase 1 in several cases. By [14], we know that all Boolean functions have a (and hence wDNNF) representation, although it may take exponential time to compute this representation. This allows us to define two preprocessing procedures. In the first, we identify cases where we can directly convert to wDNNFand use the Phase 1 algorithm above. And in the second, we use several optimization scripts available in the ABC [26] library to optimize the AIG representation of . For a majority of benchmarks, this appears to yield a representation of that allows the proof of Theorem 4.1(a) to go through. For the rest, we apply the Phase 2 algorithm as described below.
Quantitative guarantees of “goodness”
Given our theoretical and practical insights of the applicability of phase 1 of bfss, it would be interesting to measure how much progress we have made in phase 1, even if it does not give the correct Skolem functions. One way to measure this “goodness” is to estimate the number of counterexamples as a fraction of the size of the input space. Specifically, given the error formula, we get an approximate count of the number of models for this formula projected on the inputs . This can be obtained efficiently in practice with high confidence using stateoftheart approximate model counters, viz. [12], with complexity in . The approximate count thus obtained, when divided by gives the fraction of input combinations for which the candidate Skolem functions output by phase 1 do not work correctly. We call this the goodness ratio of our approximation.
5 Phase 2: Counterexampleguided refinement
For phase 2, we can use any offtheshelf worstcase exponentialtime Skolem function generator. However, given that we already have candidate Skolem functions with guarantees on their “goodness”, it is natural to use them as starting points for phase 2. Hence, we start off with candidate Skolem functions for all as computed in phase 1, and then update (or refine) them in a counterexampledriven manner. Intuitively, a counterexample is a value of the inputs for which there exists a value of that renders true, but for which evaluates to false. As shown in [22], given a candidate Skolem function vector, every satisfying assignment of the error formula gives a counterexample. The refinement step uses this satisfying assignment to update an appropriate subset of the approximate and functions computed in phase 1. The entire process is then repeated until no counterexamples can be found. The final updated vector of Skolem functions then gives a solution of the problem. Note that this idea is not new [22, 2]. The only significant enhancement we do over the algorithm in [22] is to use an almostuniform sampler [11] to efficiently sample the space of counterexamples almost uniformly. This allows us to do refinement with a diverse set of counterexamples, instead of using counterexamples in a corner of the solution space of that the SAT solver heuristics zoom down on.
6 Experimental results
Experimental methodology. Our implementation consists of two parallel pipelines that accept the same input specification but represent them in two different ways. The first pipeline takes the input formula as an AIG and builds an NNF (not necessarily wDNNF) DAG, while the second pipeline builds an ROBDD from the input AIG using dynamic variable reordering (no restrictions on variable order), and then obtains a wDNNF representation from it using the lineartime algorithm described in [14]. Once the NNF/wDNNF representation is built, we use Algorithm 1 in Phase 1 and CEGARbased synthesis using UniGen[11] to sample counterexamples in Phase 2. We call this ensemble of two pipelines as bfss. We compare bfss with the following algorithms/tools: parSyn [2], Cadet [35], RSynth [39], and AbsSyntheSkolem (based on the step of AbsSynthe [9]).
Our implementation of bfss uses the ABC [26] library to represent and manipulate Boolean functions. Two different SAT solvers can be used with bfss: ABC’s default SAT solver, or UniGen [11] (to give almostuniformly distributed counterexamples). All our experiments use UniGen.
We consider a total of benchmarks, taken from four different domains:

fortyeight Arithmetic benchmarks from [16], with varying bitwidths (viz. , , , , and ) of arithmetic operators,

sixtyeight Disjunctive Decomposition benchmarks from [2], generated by considering some of the larger sequential circuits in the HWMCC10 benchmark suite,

five Factorization benchmarks, also from [2], representing factorization of numbers of different bitwidths (, , , , ), and

three hundred and eighty three QBFEval benchmarks, taken from the Prenex 2QBF track of QBFEval 2017 [32]^{1}^{1}1The track contains benchmarks, but we were unsuccessful in converting benchmark to some of the formats required by the various tools..
Since different tools accept benchmarks in different formats, each benchmark was converted to both qdimacs and verilog/aiger formats. All benchmarks and the procedure by which we generated (and converted) them are detailed in [1]. Recall that we use two pipelines for bfss. We use “balance; rewrite l; refactor l; balance; rewrite l; rewrite lz; balance; refactor lz; rewrite lz; balance” as the ABC script for optimizing the AIG representation of the input specification. We observed that while this results in only benchmarks being in wDNNF in the first pipeline, benchmarks were solved in Phase 1 using this pipeline. This is attributable to specifications being unate in several output variables, and also satisfying the condition of Theorem 4.1(a) (while not being in wDNNF). In the second pipeline, however, we could represent benchmarks in wDNNF, and all of these were solved in Phase 1.
For each benchmark, the order (ref. step 12 of Algorithm 1) in which Skolem functions are generated is such that the variable which occurs in the transitive fanin of the least number of nodes in the AIG representation of the specification is ordered before other variables. This order () is used for both bfss and parSyn. Note that the order is completely independent of the dynamic variable order used to construct an ROBDD of the input specification in the second pipeline, prior to getting the wDNNF representation.
All experiments were performed on a messagepassing cluster, with 20 cores and GB memory per node, each core being a GHz Intel Xeon processor. The operating system was Cent OS 6.5. Twenty cores were assigned to each run of parSyn. For RSynth and Cadet a single core on the cluster was used, since these tools don’t exploit parallel processing. Each pipeline of bfss was executed on a single node; the computation of candidate functions, building of error formula and refinement of the counterexamples was performed sequentially on thread, and UniGen had threads at its disposal (idle during Phase 1).
The maximum time given for execution of any run was seconds. The total amount of main memory for any run was restricted to GB. The metric used to compare the algorithms was time taken to synthesize Boolean functions. The time reported for bfss is the better of the two times obtained from the alternative pipelines described above. Detailed results from the individual pipelines are available in Appendix A.
Results. Of the benchmarks, benchmarks were not solved by any tool – of these being from arithmetic benchmarks and from QBFEval.
Benchmark  Total  # Benchmarks  Phase 1  Phase 2  Solved By 

Domain  Benchmarks  Solved  Solved  Started  Phase 2 
QBFEval  383  170  159  73  11 
Arithmetic  48  35  35  8  0 
Disjunctive  
Decomposition  68  68  66  2  2 
Factorization  5  5  5  0  0 
Table 1 gives a summary of the performance of bfss (considering the combined pipelines) over different benchmarks suites. Of the benchmarks, bfss was successful on benchmarks; of these, are from QBFEval, from Disjunctive Decomposition, from Arithmetic and from Factorization.
Of the benchmarks in the QBFEval suite, we ran bfss only on since we could not build succinct AIGs for the remaining benchmarks. Of these, benchmarks were solved by Phase 1 (i.e., 62% of built QBFEval benchmarks) and proceeded to Phase 2, of which reached completion. On another QBFEval benchmarks Phase 1 timed out. Of the Arithmetic benchmarks, Phase 1 successfully solved (i.e., %) and Phase 2 was started for benchmarks; Phase 1 timed out on benchmarks. Of the Disjunctive Decomposition benchmarks, Phase 1 successfully solved benchmarks (i.e., 97%), and Phase 2 was started and reached completion for benchmarks. For the Factorization benchmarks, Phase 1 was successful on all benchmarks.
Recall that the goodness ratio is the ratio of the number of counterexamples remaining to the total size of the input space after Phase 1. For all benchmarks solved by Phase 1, the goodness ratio is . We analyzed the goodness ratio at the beginning of Phase 2 for benchmarks for which Phase 2 started. For benchmarks this ratio was small , and Phase 2 reached completion for these. Of the remaining benchmarks, also had a small goodness ratio (), indicating that we were close to the solution at the time of timeout. However, benchmarks in QBFEval had goodness ratio greater than , indicating that most of the counterexamples were not eliminated by timeout.
We next compare the performance of bfss with other stateofart tools. For clarity, since the number of benchmarks in the QBFEval suite is considerably greater, we plot the QBFEval benchmarks separately.
bfss vs Cadet: Of the benchmarks, Cadet was successful on benchmarks, of which belonged to Disjunctive Decomposition, to Arithmetic, to Factorization and to QBFEval. Figure 1(a) gives the performance of the two algorithms with respect to time on the QBFEval suite. Here, Cadet solved benchmarks that bfss could not solve, whereas bfss solved benchmarks that could not be solved by Cadet. Figure 1(b) gives the performance of the two algorithms with respect to time on the Arithmetic, Factorization and Disjunctive Decomposition benchmarks. In these categories, there were a total of benchmarks that bfss solved that Cadet could not solve, and there was benchmark that Cadet solved but bfss did not solve. While Cadet takes less time on Arithmetic benchmarks and many QBFEval benchmarks, on Disjunctive Decomposition and Factorization, bfss takes less time.
bfss vs parSyn: Figure 2 shows the comparison of time taken by bfss and parSyn. parSyn was successful on a total of benchmarks, and could solve benchmark which bfss could not solve. On the other hand, bfss solved benchmarks that parSyn could not solve. From Figure 2, we can see that on most of the Arithmetic, Disjunctive Decomposition and Factorization benchmarks, bfss takes less time than parSyn.
bfss vs RSynth: We next compare the performance of bfss with RSynth. As shown in Figure 3, RSynth was successful on benchmarks, with benchmarks that could be solved by RSynth but not by bfss. In contrast, bfss could solve benchmarks that RSynth could not solve! Of the benchmarks that were solved by both solvers, we can see that bfss took less time on most of them.
bfss vs AbsSyntheSkolem: AbsSyntheSkolem was successful on benchmarks, and could solve benchmarks that bfss could not solve. In contrast, bfss solved a total of benchmarks that AbsSyntheSkolem could not. Figure 4 shows a comparison of running times of bfss and AbsSyntheSkolem.
7 Conclusion
In this paper, we showed some complexitytheoretic hardness results for the Boolean functional synthesis problem. We then developed a twophase approach to solve this problem, where the first phase, which is an efficient algorithm generating polysized functions surprisingly succeeds in solving a large number of benchmarks. To explain this, we identified sufficient conditions when phase 1 gives the correct answer. For the remaining benchmarks, we employed the second phase of the algorithm that uses a CEGARbased approach and builds Skolem functions by exploiting recent advances in SAT solvers/approximate counters. As future work, we wish to explore further improvements in Phase 2, and other structural restrictions on the input that ensure completeness of Phase 1.
Acknowledgements:
We are thankful to Ajith John, Kuldeep Meel, Mate Soos, Ocan Sankur, Lucas Martinelli Tabajara and Markus Rabe for useful discussions and for providing us with various software tools used in the experimental comparisons. We also thank the anonymous reviewers for insightful comments.
References
 [1] Website for CAV 2018 Experiments. https://drive.google.com/drive/folders/0B74xgF9hCly5QXctNFpYR0VnQUU?usp=sharing (2018)
 [2] Akshay, S., Chakraborty, S., John, A.K., Shah, S.: Towards parallel boolean functional synthesis. In: TACAS 2017 Proceedings, Part I. pp. 337–353 (2017), https://doi.org/10.1007/9783662545775_19
 [3] Alur, R., Madhusudan, P., Nam, W.: Symbolic computational techniques for solving games. STTT 7(2), 118–128 (2005)
 [4] Andersson, G., Bjesse, P., Cook, B., Hanna, Z.: A proof engine approach to solving combinational design automation problems. In: Proceedings of the 39th Annual Design Automation Conference. pp. 725–730. DAC ’02, ACM, New York, NY, USA (2002), http://doi.acm.org/10.1145/513918.514101
 [5] Baader, F.: On the complexity of boolean unification. Tech. rep. (1999)
 [6] Balabanov, V., Jiang, J.H.R.: Unified qbf certification and its applications. Form. Methods Syst. Des. 41(1), 45–65 (Aug 2012), http://dx.doi.org/10.1007/s1070301201526
 [7] Boole, G.: The Mathematical Analysis of Logic. Philosophical Library (1847), https://books.google.co.in/books?id=zv4YAQAAIAAJ
 [8] Boudet, A., Jouannaud, J.P., SchmidtSchauss, M.: Unification in boolean rings and abelian groups. J. Symb. Comput. 8(5), 449–477 (Nov 1989), http://dx.doi.org/10.1016/S07477171(89)800549
 [9] Brenguier, R., Pérez, G.A., Raskin, J.F., Sankur, O.: Abssynthe: abstract synthesis from succinct safety specifications. In: Proceedings 3rd Workshop on Synthesis (SYNT’14). Electronic Proceedings in Theoretical Computer Science, vol. 157, pp. 100–116. Open Publishing Association (2014), http://arxiv.org/abs/1407.5961v1
 [10] Bryant, R.E.: Graphbased algorithms for boolean function manipulation. IEEE Trans. Comput. 35(8), 677–691 (Aug 1986), http://dx.doi.org/10.1109/TC.1986.1676819
 [11] Chakraborty, S., Fremont, D.J., Meel, K.S., Seshia, S.A., Vardi, M.Y.: On parallel scalable uniform SAT witness generation. In: Tools and Algorithms for the Construction and Analysis of Systems  21st International Conference, TACAS 2015, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2015, London, UK, April 1118, 2015. Proceedings. pp. 304–319 (2015)
 [12] Chakraborty, S., Meel, K.S., Vardi, M.Y.: Algorithmic improvements in approximate counting for probabilistic inference: From linear to logarithmic SAT calls. In: Proceedings of the TwentyFifth International Joint Conference on Artificial Intelligence, IJCAI 2016, New York, NY, USA, 915 July 2016. pp. 3569–3576 (2016)
 [13] Chen, Y., Eickmeyer, K., Flum, J.: The exponential time hypothesis and the parameterized clique problem. In: Proceedings of the 7th International Conference on Parameterized and Exact Computation. pp. 13–24. IPEC’12, SpringerVerlag, Berlin, Heidelberg (2012)
 [14] Darwiche, A.: Decomposable negation normal form. J. ACM 48(4), 608–647 (2001)
 [15] Deschamps, J.P.: Parametric solutions of boolean equations. Discrete Math. 3(4), 333–342 (Jan 1972), http://dx.doi.org/10.1016/0012365X(72)900908
 [16] Fried, D., Tabajara, L.M., Vardi, M.Y.: BDDbased boolean functional synthesis. In: Computer Aided Verification  28th International Conference, CAV 2016, Toronto, ON, Canada, July 1723, 2016, Proceedings, Part II. pp. 402–421 (2016)
 [17] IP01: On the complexity of ksat. J. Comput. Syst. Sci. 62(2), 367–375 (2001)
 [18] Jacobs, S., Bloem, R., Brenguier, R., Könighofer, R., Pérez, G.A., Raskin, J., Ryzhyk, L., Sankur, O., Seidl, M., Tentrup, L., Walker, A.: The second reactive synthesis competition (SYNTCOMP 2015). In: Proceedings Fourth Workshop on Synthesis, SYNT 2015, San Francisco, CA, USA, 18th July 2015. pp. 27–57 (2015)
 [19] Jiang, J.H.R.: Quantifier elimination via functional composition. In: Proc. of CAV. pp. 383–397. Springer (2009)
 [20] Jiang, J.H.R., Balabanov, V.: Resolution proofs and Skolem functions in QBF evaluation and applications. In: Proc. of CAV. pp. 149–164. Springer (2011)
 [21] Jo, S., Matsumoto, T., Fujita, M.: Satbased automatic rectification and debugging of combinational circuits with lut insertions. In: Proceedings of the 2012 IEEE 21st Asian Test Symposium. pp. 19–24. ATS ’12, IEEE Computer Society (2012)
 [22] John, A., Shah, S., Chakraborty, S., Trivedi, A., Akshay, S.: Skolem functions for factored formulas. In: FMCAD. pp. 73–80 (2015)
 [23] Karp, R., Lipton, R.: Turing machines that take advice. L’Enseignment Math/’ematique 28(2), 191–209 (1982)
 [24] Kuehlmann, A., Paruthi, V., Krohm, F., Ganai, M.K.: Robust boolean reasoning for equivalence checking and functional property verification. IEEE Trans. on CAD of Integrated Circuits and Systems 21(12), 1377–1394 (2002), http://dblp.unitrier.de/db/journals/tcad/tcad21.html#KuehlmannPKG02
 [25] Kuncak, V., Mayer, M., Piskac, R., Suter, P.: Complete functional synthesis. SIGPLAN Not. 45(6), 316–329 (Jun 2010)
 [26] Logic, B., Group, V.: ABC: A System for Sequential Synthesis and Verification . http://www.eecs.berkeley.edu/~alanmi/abc/
 [27] Lowenheim, L.: Über die Auflösung von Gleichungen in Logischen Gebietkalkul. Math. Ann. 68, 169–207 (1910)
 [28] Macii, E., Odasso, G., Poncino, M.: Comparing different boolean unification algorithms. In: Proc. of 32nd Asilomar Conference on Signals, Systems and Computers. pp. 17–29 (2006)
 [29] Marijn Heule, M.S., Biere, A.: Efficient Extraction of Skolem Functions from QRAT Proofs. In: Formal Methods in ComputerAided Design, FMCAD 2014, Lausanne, Switzerland, October 2124, 2014. pp. 107–114 (2014)
 [30] Martin, U., Nipkow, T.: Boolean unification  the story so far. J. Symb. Comput. 7(34), 275–293 (Mar 1989), http://dx.doi.org/10.1016/S07477171(89)800136
 [31] Niemetz, A., Preiner, M., Lonsing, F., Seidl, M., Biere, A.: Resolutionbased certificate extraction for QBF  (tool presentation). In: Theory and Applications of Satisfiability Testing  SAT 2012  15th International Conference, Trento, Italy, June 1720, 2012. Proceedings. pp. 430–435 (2012)
 [32] QBFLib: Qbfeval 2017. http://www.qbflib.org/event_page.php?year=2017
 [33] Rabe, M.N., Seshia, S.A.: Incremental determinization. In: Theory and Applications of Satisfiability Testing  SAT 2016  19th International Conference, Bordeaux, France, July 58, 2016, Proceedings. pp. 375–392 (2016), https://doi.org/10.1007/9783319409702_23
 [34] Rabe, M.N., Tentrup, L.: CAQE: A certifying QBF solver. In: Formal Methods in ComputerAided Design, FMCAD 2015, Austin, Texas, USA, September 2730, 2015. pp. 136–143 (2015)
 [35] Rabe, M.N., Seshia, S.A.: Incremental determinization. In: Theory and Applications of Satisfiability Testing  SAT 2016  19th International Conference, Bordeaux, France, July 58, 2016, Proceedings. pp. 375–392 (2016), https://doi.org/10.1007/9783319409702_23
 [36] SolarLezama, A.: Program sketching. STTT 15(56), 475–495 (2013)
 [37] SolarLezama, A., Rabbah, R.M., Bodík, R., Ebcioglu, K.: Programming by sketching for bitstreaming programs. In: Proceedings of the ACM SIGPLAN 2005 Conference on Programming Language Design and Implementation, Chicago, IL, USA, June 1215, 2005. pp. 281–294 (2005)
 [38] Srivastava, S., Gulwani, S., Foster, J.S.: Templatebased program verification and program synthesis. STTT 15(56), 497–518 (2013)
 [39] Tabajara, L.M., Vardi, M.Y.: Factored boolean functional synthesis. In: 2017 Formal Methods in Computer Aided Design, FMCAD 2017, Vienna, Austria, October 26, 2017. pp. 124–131 (2017)
 [40] Trivedi, A.: Techniques in Symbolic Model Checking. Master’s thesis, Indian Institute of Technology Bombay, Mumbai, India (2003)
 [41] Zhu, S., Tabajara, L.M., Li, J., Pu, G., Vardi, M.Y.: Symbolic LTLf synthesis. In: Proceedings of the TwentySixth International Joint Conference on Artificial Intelligence, IJCAI 2017, Melbourne, Australia, August 1925, 2017. pp. 1362–1369 (2017)
Appendix A Detailed Results for individual pipelines of BFSS
As mentioned in section 6, bfss is an ensemble of two pipelines, an AIGNNF pipeline and a BDDwDNNF pipeline. These two pipelines accept the same input specification but represent them in two different ways. The first pipeline takes the input formula as an AIG and builds an NNF (not necessarily a wDNNF) DAG, while the second pipeline first builds an ROBDD from the input AIG using dynamic variable reordering, and then obtains a wDNNF representation from the ROBDD using the lineartime algorithm described in [14]. Once the NNF/wDNNF representation is built, the same algorithm is used to generate skolem functions, namely, Algorithm 1 is used in Phase 1 and CEGARbased synthesis using UniGen[11] to sample counterexamples is used in Phase 2. In this section, we give the individual results of the two pipelines.
a.1 Performance of the AIGNNF pipeline
Benchmark  Total  # Benchmarks  Phase 1  Phase 2  Solved By 

Domain  Benchmarks  Solved  Solved  Started  Phase 2 
QBFEval  383  133  122  110  11 
Arithmetic  48  31  31  12  0 
Disjunctive  
Decomposition  68  68  66  2  2 
Factorization  5  4  0  5  4 
In the AIGNNF pipeline, bfss solves a total of benchmarks, with benchmarks in QBFEval, in Arithmetic, all the benchmarks of Disjunctive Decomposition and benchmarks in Factorization. Of the benchmarks in QBFEval (as mentioned in Section 6, we could not build succinct AIGs for the remaining benchmarks and did not run our tool on them), Phase 1 solved benchmarks and Phase 2 was started on benchmarks, of which benchmarks reached completion. Of the benchmarks in Arithmetic, Phase 1 solved and Phase was started on . On the remaining Arithmetic benchmarks, Phase 1 did not reach completion. Of the Disjunctive Decomposition benchmarks, were successfully solved by Phase 1 and the remaining by Phase 2. Phase 2 had started on all the benchmarks in Factorization and reached completion on benchmarks.
a.1.1 Plots for the AIGNNF pipeline
Figure 5 shows the performance of bfss (AIGNNF pipeline) versus Cadet for all the four benchmark domains. Amongst the four domains, Cadet solved benchmarks that bfss could not solve. Of these, belonged to QBFEval and belonged to Arithmetic. On the other hand, bfss solved benchmarks that Cadet could not solve. Of these, belonged to QBFEval, to Arithmetic, to Factorization and to Disjunctive Decomposition. From Figure 5, we can see that while Cadet takes less time than bfss on many Arithmetic and QBFEval benchmarks, on Disjunctive Decomposition and Factorization, the AIGNNF pipeline of bfss takes less time.
Figure 6 shows the performance of bfss (AIGNNF pipeline) versus parSyn. Amongst the domains, parSyn solved benchmarks that bfss could not solve, of these benchmark belonged to the Arithmetic domain and benchmarks belonged to QBFEval. On the other hand, bfss solved benchmarks that parSyn could not solve. Of these, belonged to QBFEval, to Arithmetic and to Disjunctive Decomposition. From 5, we can see that while the behaviour of parSyn and bfss is comparable for many QBFEval benchmarks, on most of the Arithmetic, Disjunctive Decomposition and Factorization benchmarks, the AIGNNF pipeline of bfss takes less time.
Figure 7 gives the comparison of the AIGNNF pipeline of bfss and RSynth. While RSynth solves benchmarks that bfss does not solve, bfss solves benchmarks that RSynth could not solve. Of these belonged to QBFEval, to Arithmetic, to Disjunctive Decomposition and to Factorization. Moreover, on most of the benchmarks that both the tools solved, bfss takes less time.
Figure 8 gives the comparison of of the performance of the AIGNNF pipeline of bfss and AbsSyntheSkolem. While AbsSyntheSkolem solves benchmarks that bfss could not solve, bfss solved benchmarks that AbsSyntheSkolem could not solve. Of these belonged to QBFEval, to Arithmetic and to Disjunctive Decomposition.
a.2 Performance of the BDDwDNNF pipeline
In this section, we discuss the performance of the BDDwDNNF pipeline of bfss. Recall that in this pipeline the tool builds an ROBDD from the input AIG using dynamic variable reordering and then converts the ROBDD in a wDNNF representation. In this section, by bfss we mean, the BDDwDNNF pipeline of the tool.
Table 3 gives the performance summary of the BDDwDNNF pipeline. Using this pipeline, the tool solved a total of benchmarks, of which belonged to QBFEval, belonged to Arithmetic, belonged to Disjunctive Decomposition and belonged to Factorization. As expected, since the representation is already in wDNNF, the skolem functions generated at end of Phase 1 were indeed exact (see Theorem 4.1(b)) and we did not require to start Phase 2 on any benchmark. We also found that the memory requirements of this pipeline were higher, and for some benchmarks the tool failed because the ROBDDs (and hence resulting wDNNF representation) were large in size, resulting in out of memory errors or assertion failures in the underlying AIG library.
Benchmark  Total  # Benchmarks  Phase 1  Phase 2  Solved By 

Domain  Benchmarks  Solved  Solved  Started  Phase 2 
QBFEval  383  143  143  0  0 
Arithmetic  48  23  23  0  0 
Disjunctive  
Decomposition  68  59  59  0  0 
Factorization  5  5  5  0  0 
a.2.1 Plots for the BDDwDNNF pipeline
Figure 5 gives the performance of bfss versus Cadet. The performance of Cadet and bfss is comparable, with Cadet solving benchmarks across all domains that bfss could not and bfss solving benchmarks that Cadet could not. While Cadet takes less time on many QBFEval benchmarks, on many Arithmetic, Disjunctive Decomposition and Factorization Benchmarks, the BDDwDNNF pipeline of bfss takes less time.
Figure 10 gives the performance of bfss versus parSyn. While parSyn could solve benchmarks across all domains that bfss could not, the BDDwDNNF pipeline of bfss solved benchmarks that parSyn could not.
Figure 11 gives the performance of bfss versus RSynth. While RSynth could solve benchmarks across all domains that bfss could not, the BDDwDNNF pipeline of bfss solved benchmarks that RSynth could not. Furthermore from Figure 11, we can see that on most benchmarks, which both the tools could solve, bfss takes less time.
Figure 12 gives the performance of bfss versus AbsSyntheSkolem. While AbsSyntheSkolem could solve benchmarks across all domains that bfss could not, the BDDwDNNF pipeline of bfss solved benchmarks which could not be solved by AbsSyntheSkolem.
a.3 Comparison of the two pipelines
Figure 13 compares the performances of the two pipelines. We can see that while there were some benchmarks which only one of the pipelines could solve, apart from Factorization benchmarks, for most of the QBFEval, Arithmetic and Disjunctive Decomposition Benchmarks, the time taken by the AIGNNF pipeline was less than the time taken by the BDDwDNNF pipeline.