What’s Decidable about
SyntaxGuided Synthesis?
Abstract
Syntaxguided synthesis (SyGuS) is a recently proposed framework for program synthesis problems. The SyGuS problem is to find an expression or program generated by a given grammar that meets a correctness specification. Correctness specifications are given as formulas in suitable logical theories, typically amongst those studied in satisfiability modulo theories (SMT).
In this work, we analyze the decidability of the SyGuS problem for different classes of grammars and correctness specifications. We prove that the SyGuS problem is undecidable for the theory of equality with uninterpreted functions (EUF). We identify a fragment of EUF, which we call regularEUF, for which the SyGuS problem is decidable. We prove that this restricted problem is EXPTIMEcomplete and that the sets of solution expressions are precisely the regular tree languages. For theories that admit a unique, finite domain, we give a general algorithm to solve the SyGuS problem on tree grammars. Finitedomain theories include the bitvector theory without concatenation. We prove SyGuS undecidable for a very simple bitvector theory with concatenation, both for contextfree grammars and for tree grammars. Finally, we give some additional results for linear arithmetic and bitvector arithmetic along with a discussion of the implication of these results.
1 Introduction
Program synthesis is an area concerned with the automatic generation of a program from a highlevel specification of correctness. The specification may either be total, e.g., in the form of a simple but unoptimized program, or partial, e.g., in the form of a logical formula or even a collection of test cases. Regardless, one can typically come up with a suitable logic in which to formally capture the class of specifications. Traditionally, program synthesis has been viewed as a deductive process, wherein a program is derived from the constructive proof of the theorem that states that for all inputs, there exists an output, such that the desired correctness specification holds [20], with no assumptions made about the syntactic form of the program. However, over the past decade, there has been a successful trend in synthesis in which, in addition to the correctness specification, one also supplies a hypothesis about the syntactic form of the desired program. Such a hypothesis can take many forms: partial programs with “holes” [21, 22], component libraries [16, 14], protocol scenarios [23, 1], etc. Moreover, the synthesis of verification artifacts, such as invariants [7], also makes use of “templates” constraining their syntactic structure. The intuition is that such syntactic restrictions on the form of the program reduce the search space for the synthesis algorithms, and thus speed up the overall synthesis or verification process.
Syntaxguided synthesis (SyGuS) [2] is a recentlyproposed formalism that captures this trend as a new class of problems. More precisely, a SyGuS problem comprises a logical specification in a suitable logical theory that references one or more typed function symbols that must be synthesized, along with one or more formal languages of expressions of the same type as , with the goal of finding expressions such that when is replaced by in , the resulting formula is valid in . The formal language is typically given in the form of a grammar . Since the SyGuS definition was proposed about three years ago, it has been adopted by several groups as a unifying formalism for a class of synthesis efforts, with a standardized language (SynthLIB) and an associated annual competition. However, the theoretical study of SyGuS is still in its infancy. Specifically, to our knowledge, there are no published results about the decidability or complexity of syntaxguided synthesis for specific logics and grammars.
Theory Grammar Class  Regular Tree  Contextfree 

FiniteDomain  D  U 
BitVectors  U  U 
Arrays  U  U 
EUF  U  U 
RegularEUF  D  ? 
In this paper, we present a theoretical analysis of the syntaxguided synthesis problem. We analyze the decidability of the SyGuS problem for different classes of grammars and logics. For grammars, we consider arbitrary contextfree grammars, tree grammars, and grammars specific to linear real arithmetic and linear integer arithmetic. For logics, we consider the major theories studied in satisfiability modulo theories (SMT) [5], including equality and uninterpreted functions (EUF), finiteprecision bitvectors (BV), and arrays – extensional or otherwise (AR), as well as theories with finite domains (FD). Our major results are as follows:

For EUF, we show that the SyGuS problem is undecidable over tree grammars. These results extend straightforwardly for the theory of arrays. (See Section 3.)

We present a fragment of EUF, called regularEUF, for which the SyGuS problem is EXPTIMEcomplete given regular tree grammars. We prove that the sets of solution to regularEUF problems are in onetoone correspondence with the regular tree languages. (See Section 4.)

For arbitrary theories with finite domains (FD) defined in Section 5, we show that the SyGuS problem is decidable for tree grammars, but undecidable for arbitrary contextfree grammars.

For BV, we show (perhaps surprisingly) that the SyGuS problem is undecidable for the classes of contextfree grammars and tree grammars. (See Section 6.)
See Table 1 for a summary of our main results. In addition, we also consider certain restricted grammars specific to the theory of linear arithmetic over the reals and integers (LRA and LIA), as well as bitvectors (BV) where the grammars generate arbitrary but wellformed expressions in those theories and discuss the decidability of the problem in Section 7. The paper concludes in Section 8 with a discussion of the results, their implications, and directions for future work.
2 Preliminaries
We review some key definitions and results used in the rest of the paper.
Terms and Substitutions
We follow the book by Baader and Nipkow [3]. A signature (or ranked alphabet) consists of a set of function symbols with an associated arity, a nonnegative number indicating the number of arguments. For example consists of binary function symbol and constants and . For any arity , we let denote the set of function symbols with arity (the ary symbols). We will refer to the ary function symbols as constants.
For any signature and set of variables such that , we define the set of terms over inductively as the smallest set satisfying:


For all , all , and all , we have .
We define the set of ground terms of to be the set (or short ). We define the subterms of a term recursively as , which we lift to sets of terms, . We say that a set of terms is subtermclosed if .
For a set of variables (or constants) and terms , the term is formed by replacing each instance of each in with . We call a substitution. Substitutions extend in the natural way to formulae, by applying the substitution to each term in the formula.
We extend substitution to function symbols with arity , where it is also called secondorder substitution. For a function symbol of arity , a signature , and a fresh set of variables , a substitution to in is a term . Given a term , the term is formed by replacing each occurrence of any term in with (sometimes written ). We say that are the bound variables of . Intuitively, secondorder substitution replaces not only by , but also replaces the arguments of each function application by the bound variables.
A context is a term in with a single occurrence of . For , we write for .
Logical Theories
A firstorder model in , also called model, is a pair consisting of a set called its domain and a mapping . The mapping assigns to each function symbol with arity a total function , and to each relation of arity a set .
A formula is a boolean combination of relations over terms. The mapping induced by a model defines a natural mapping of formulas to truth values, written (we also say satisfies ). For some set of firstorder formulas, we say if for each . A theory is a set of formulas. We say is a model of if , and use to denote the set of models of . A firstorder formula is valid in if for all , . A theory is complete if for all formulas either or is valid.
Given a set of ground equations and terms , we say that if there exists an in and a context such that and . For example, if , then . Let be the symmetric and transitive closure of . We will sometimes write instead of . We will use to represent the set . Birkhoff’s Theorem states that for any ranked alphabet , set and , if and only if for every model in such that it holds [3].
In this work, we consider the common quantifierfree background theories of SMT solving: propositional logic (SAT), bitvectors (BV), difference logic (DL), linear real arithmetic (LRA), linear integer (Presburger) arithmetic (LIA), the theory of arrays (AR), and the theory of uninterpreted functions with equality (EUF). For detailed definitions of these theories, see [5, 4].
For the theory of EUF it is common to introduce the IfThenElse operator (ITE) as syntactic sugar [6, 5, 4]. We follow this tradition and allow EUF formulas to contain terms of the form , where is a formula, and and are terms. To desugar EUF formulas we introduce an additional constant and add two constrains and for each ITE term . As we will see in Section 3 the presence syntactic sugar such as the ITE operator in the grammar of SyGuS problems may have a surprising effect on the decidability of the SyGuS problem.
Grammars and Automata
A contextfree grammar (CFG) is a tuple consisting of a finite set of nonterminal symbols with a distinguished start symbol , a finite set of terminal symbols, and a finite set of production rules, which are tuples of the form . Production rules indicate the allowed replacements of nonterminals by sequences over nonterminals and terminals. The language, , generated by a contextfree grammar is the set of all sequences that contain only terminal symbols that can be derived from the start symbol using the production rules.
Tree grammars are a more restrictive class of grammars. They are defined relative to a ranked alphabet . A regular tree grammar consists of a set of nonterminals, a start symbol , a ranked alphabet , and a set of production rules. Production rules are of the form , where , is in and has arity , and each is in . For a given treegrammar we write for the set of trees produced by . The regular tree languages are the languages produced by some regular tree grammar. Any regular tree grammar can be converted to a CFG by simply treating the righthand side of any production as a string, rather than a tree. Thus, the undecidability results for SyGuS given regular tree grammars extend to undecidability results for SyGuS given CFGs.
Let be a signature of a background theory . We define a tree grammar to be compatible (or compatible) if and the arities for all symbols in match those in .
A (deterministic) bottomup (or rational) tree automaton is a tuple . Here, is a set of states, , and is a ranked alphabet. The function maps a symbol and states to a new state , for all . If no such exists, is undefined. We can inductively extend to terms, where for all and all , we set . The language accepted by is the set . There exist transformations between regular tree grammars and rational tree automata [8], and we will sometimes also define SyGuS problems in terms of rational tree automata rather than a regular tree grammars.
SyntaxGuided Synthesis
We follow the definition of SyGuS given by Alur et al. [2], but we focus on the case to find a replacement for a single designated function symbol with a candidate expression (the program), which is generated by a given grammar . Let be a background theory over signature , and let be a class of grammars. Given a function symbol with arity , a formula over the signature , and a grammar of terms in , the SyGuS problem is to find a term such that the formula is valid or to determine the absence of such a term. We represent the SyGuS problem as the tuple .
The variables that may occur in the generated term stand for the arguments of . For each function application of the higherorder substitution then replaces by the arguments of the function application.
Note that the original definition of SyGuS allows for universally quantified variables, while our definition above admits no variables. This is equivalent as universally quantified variables can be replaced with fresh constants without affecting validity.
Example 1
Consider the following example SyGuS problem in linear integer arithmetic. Let the type of the function to synthesize be and let the specification be given by the logical formula
We can restrict the set of expressions to be expressions generated by the grammar below:
Term  
Cond 
It is easy to see that a function computing the maximum over and , such as , is a solution to the SyGuS problem. There are, however, other solutions, such as . The function computing the sum of and would satisfy the specification, but cannot be constructed in the grammar.
3 SyGuSEUF is Undecidable
We use SyGuSEUF to denote the class of SyGuS problems where is a grammar generating expressions that are syntactically wellformed expressions in EUF for . In this section, we prove that SyGuSEUF is undecidable. The proof of undecidability is a reduction from the simultaneous rigid Eunification problem (SREU) [11]. We say that a set of equations between terms in together with an equation between terms in forms a rigid expression, denoted . A solution to is a substitution , such that and are ground for each and . Given a set of rigid equations, the SREU problem is to find a substitution that is a solution to each rigid equation in , and is known to be undecidable [11].
Reducing SREU to SyGuSEUF. We start the reduction with constructing a boolean expression for a given set of rigid equations over alphabet and variables . Let each be , where , and are equations between terms in . We associate with each rigid expression a boolean expression , where is the substitution . The symbol is a unary function symbol to be synthesized and are fresh constants ( for all ). We set .
Next we give the grammar , which generates the terms that may replace in . We define to have the starting nonterminal and the following rules:
where is a fresh constant ( and for all ). Additionally, for each we add a rule , where the number of argument terms of matches its arity.
Lemma 1
The SREU problem has a solution if and only if the SyGuSEUF problem has a solution over the ranked alphabet .
Proof
The main idea behind this proof is that each in represents the variable in . Any replacement to found in corresponds to a substitution on all variables in that grounds the equations in the SREU problem.
Let be a solution to , where each is a ground term in . We consider the term , which is in the language of the grammar . To show that is valid, it suffices to show that for each model of and for each we have . If , then holds trivially. We handle the remaining case below, giving justifications to the right of each new equation.

Assume

(1)

For each : (2)

For each : (3)

(1)

(4, 5)

(def. SREU)

(6,7, Birkhoff’s Thm.)

(3,8)
Therefore, and we get that is a solution to the SyGuS problem .
Let and be defined as before and assume that is a solution to the SyGuS problem . Each in is ground, since the nonterminal in can only produce ground terms. Chose any . We will show for every model on , that if then . By Birkhoff’s theorem, this implies .

Assume

Let be a model over such that and assigns each to a distinct new element not in .

(2)

For each : (3)

(1,2)

(4,5)

( is a SyGuS solution)

(6, 7)

(3,8)

(2,9)
Thus and is a solution to . ∎
Theorem 3.1
The SyGuSEUF problem is undecidable.
Remark on EUF without ITE. A key step in the proof of Lemma 1 is the use of ITE statements to allow a single expression to encode instantiations of multiple different variables. As discussed in Section 2, ITE statements are commonly part of EUF, but some definitions of EUF do not allow for ITE statements [19]. While this syntactic sugar has no effect on the complexity of the validity of EUF formulas, the undecidability of SyGuSEUF may depend on the availability of ITE operators. It remains open whether there exist alternative proofs of undecidability that do not rely on ITE statements.
We use SyGuSArrays to denote the class of SyGuS problems , where Arrays is the theory of arrays [5], and is a grammar such that are syntactically wellformed expressions in Arrays for . There is a standard construction for representing uninterpreted functions as readonly arrays [5]. Therefore, the undecidability of SyGuSArrays follows from the undecidability of SyGuSEUF, as we state below.
Corollary 1
The SyGuSArrays problem is undecidable.
4 Regular SyGuSEUF
This section describes a fragment of EUF, which we call regularEUF, for which the SyGuS problem is decidable.
Definition 1
We call a regular SyGuSEUF problem if contains no ITE expressions and is a regularEUF formula as defined below.
A regularEUF formula is a formula over some ranked alphabet , where each satisfies the following conditions:

It is a disjunction of equations or the negation of equations.

It does not contain any ITE expressions.

It contains at most one occurrence of per equation.

It satisfies one of the following cases:

Case 1: The symbol only occurs in positive equations.

Case 2: The symbol occurs in exactly one negative equation, and nowhere else.

We define any disjunction that satisfies the above conditions as regular. We will refer to a regular as case1 or case2, depending on which of the above cases is satisfied. Note that every regularEUF formula is in conjunctive normal form.
We will show that for every regular , we can construct a regular tree automaton accepting precisely the solutions to the SyGuSEUF problem on . The set of solutions to then becomes , where is the grammar of possible replacements. The grammar can be represented as a deterministic bottomup tree automaton whose size is exponential in [8]. The productautomaton construction can be used to determine if is nonempty, which would imply that a solution exists to the corresponding SyGuS problem. This construction takes time and space. Note that this is at most exponential even when some of the automata have size exponential in or .
The connection between sets of ground equations and regular tree languages was first observed by Kozen [17], who showed that a language is regular if and only if there exist a set of ground equations and collection of ground terms such that . The following, very similar theorem shows that a certain set of equivalence classes of a ground equational theory can be represented by a regular tree automaton.
Theorem 4.1
Let be a set of ground equations over the alphabet , and let be a subtermclosed set of terms such that every term in is in . There exists a regular tree automaton without accepting states such that a state in represents an equivalence class of a term in . More formally, this means that for all terms such that there exist terms so that and , it holds that if and only if .
Proof
Let . For each term , for , let .
We define the function to operate on as follows: First, remove from . For all and such that , add to . If there already exists some such that , then .
Now for each in , call . A simple inductive argument will show that the resulting automaton is . ∎
Let be a regular formula. Let and . We can rewrite to the normal form . Solving the SyGuS problem for then becomes a problem of finding a such that for some . The technique to form the automaton that represents the solutions to depends on whether is case1 or case2.
Assume that is case1 and chose some . Assume is not in . If , then is trivially solvable. If , then can be removed from to yield an equally solvable formula. Now assume is in . Without loss of generality, there is a context and a set of terms such that . Let and let be the automaton defined in the proof of theorem 4.1. For each , there is a ground term such that . Let be the set of states such that . By theorem 4.1, if and only if . Therefore, for any replacement, , of , if and only if .
Let be a tree automaton with accepting states . For each , let . For all , let . A simple inductive argument will show that for any replacement of , . Thus, defines the precise set of terms such that .
The set of solutions to can be given by the automaton whose language is . This can be found in time and space exponential in using the product construction for tree automata [8].
Example 2
Let . Note that this is a case1 regular EUF clause. If we set and , then is the automaton from figure 1 (excluding the accepting state and transition). Since the argument of in is and parses to state1, a transition from to state1 is added to . Since parses to state2 in , state2 is set as an accepting state in . So accepts the replacements to such that is valid.
Assume is case2 and let be the equation in that contains . Without loss of generality, there is a context and a set of terms such that . Let , and let . Choose some . If , then every replacement to is a solution. So assume Let is a replacement to such that and . Let . Assume is not equivalent to any term in , let and let . We know has no outgoing edges: if it did, would be equivalent to some term in . By construction, is equivalent to calling on . Since has no outgoing edges, calling on cannot induce any more merges. Therefore, since is not equal to , they are not equal after the merge. So, and thus are equivalent to some terms in .
Let . For each , there is a ground term such that . Let be the set of states such that for each . Then for each replacement , if and only if .
Let . Let be a tree automaton with accepting states . For each , let . For all , let . A simple inductive argument will show that are precisely the solutions to .
Example 3
Let . Note that this is a case2 regularEUF clause. If we set and , then is the automaton from the left side of figure2 (excluding the accepting state and transition). Since the argument of in is and parses to state2, a transition from to state2 is added to . If we choose a replacement such that parses to state3 in , then applying the equation merges state3 with state1. This, in turn, forces a merge between the new state and state2, yielding the automaton on the right side of figure2. This automaton parses and to the same state, so state3 is an accepting state. This does not occur if parses to state1 or state2 in , so they are not accepting states. So accepts the replacements to such that is valid.
We can summarize the above construction in the following lemma.
Lemma 2
The regular SyGuSEUF problem is in EXPTIME.
The relationship between regular tree languages and the regular SyGuSEUF problem is quite deep. Using the following lemma and the above constructions, we can see that a tree language is regular if and only if it is the set of solutions to a regular SyGuSEUF problem.
Lemma 3
Let be a tree automaton. There exists a regular disjunctive formula such that is the set of solutions to .
Proof
Let be a subtermclosed set of terms such that for each state , there is a term such that . Without loss of generality, assume that each is a subterm of some term in . Let for some new constants . Let and . Finally set . Using the construction from theorem 4.1, it is easy to check that the set of solutions to are precisely . ∎
We can also use the above lemma to show that regular SyGuSEUF is EXPTIMEcomplete, as we will see below.
Lemma 4
The regular SyGuSEUF problem is EXPTIMEhard.
Proof
We reduce from the EXPTIMEcomplete problem of determining whether a set of regular tree automata have languages with a nonempty intersection [24]. Let , …, be a set of regular tree automata over some alphabet . For each automaton , construct the formula as described in lemma 3. Let . Let be a nullary function symbol to be synthesized, and let be a grammar such that . The solutions to the regular SyGuSEUF problem are the members of the set . Therefore, has a solution if and only if is nonempty. ∎
Using the above lemma and lemma 2, we can conclude the following theorem.
Theorem 4.2
The regular SyGuSEUF problem is EXPTIMEcomplete.
In concluding this section, we remark that the case1 and case2 restrictions on regular clauses are necessary. For lack of space, we exclude the details; the appendix contains an example elaborating on this point.
5 FiniteDomain Theories
In addition to the “standard” theories, we also consider a family of theories that we term finitedomain (FD) theories. Formally, an FD theory is a complete theory that admits one domain (up to isomorphism), and whose only domain is finite. For example, consider group axioms with a constant and the statements and . This is an FD theory, since, up to isomorphism, the only model of this theory is the integers with addition modulo 3. Also Boolean logic and the theory of fixedlength bitvectors without concatenation are FD theories. Bitvector theories with (unrestricted) concatenation allow us to construct arbitrarily many distinct constants and are thus not FD theories.
In this section we give a generic algorithm for any complete finitedomain theory for which validity is decidable. Let be a such a theory and let be a model of with a finite domain . Assume without loss of generality that for every element there is a constant in such that .
We consider a SyGuS problem with a correctness specification in theory , a function symbol to synthesize, and a tree grammar generating the set of candidate expressions. Let be the constants occurring in . The expression generated by to replace can be seen as a function mapping to an element in . If the domain of is finite there are only finitely many candidate functions, but it can be nontrivial to determine which functions can be generated by . In the following, we describe an algorithm that iteratively determines the set of functions that can be generated by each nonterminal in the grammar .
For each , we maintain a set of expressions . In each iteration and for each production rule for in , we consider the expressions where if is an expression (i.e. ) and if is a nonterminal . Given such an expression , we compute the function table, that is the result of for each , compare it to the function table of the expressions currently in . Our assumption of decidability of the validity problem for guarantees that this operation is decidable. If represents a new function, we add it to the set .
The algorithm terminates, after an iteration in which no set changed. As there are only finitely many functions from to and the sets grow monotonously, the algorithm eventually terminates. To determine the answer to the SyGuS problem, we then check whether there is an expression in , for which is valid.
Theorem 5.1
Let be a complete theory for which validity is decidable and which has a finitedomain model . The SyGuS problem for and compatible tree grammars is decidable.
Iteration#  

1  none  
2  none  
3  none 


4 




5  none  none 



Example 4
Consider the SyGuS problem over boolean expressions with the specification , where denotes the XOR operation and is the function symbol to synthesize from the following tree grammar (we use infix operators for readability):
The grammar generates boolean functions of variables and and the updates to , , and during each iteration of the proposed algorithm are given in Table 2. The next step in the algorithm is to determine if any of the three expressions make the formula valid, which is not the case.
6 BitVectors
In this section, we show that the SyGuS problem for the theory of bitvectors is undecidable  even when we restrict the problem to tree grammars. The proof makes use of the fact that we can construct (bit)strings with the concatenation operation and can compare arbitrarily large strings with the equality operation. This enables us to encode problem of determining if the languages of CFGs with no transitions have nonempty intersection, which is undecidable [15].
Theorem 6.1
The SyGuS problem for the theory of bitvectors is undecidable for both the class of contextfree grammars and the class of BVcompatible tree grammars.
Proof
We start with the proof for the class of contextfree grammars. Given two contextfree grammars and , we define a SyGuS problem with a single contextfree grammar that has a solution iff the intersection of and is not empty. The proof idea is to express the intersection of the two grammars as the equality between two expressions, each generated by one of the grammars. The new grammar thus starts with the following production rule:
We then have to translate the grammars and into grammars and that produce expressions in the bit vector theory instead of arbitrary strings over their alphabets. There is a string produced by both and