What’s Decidable aboutSyntax-Guided Synthesis?

What’s Decidable about
Syntax-Guided Synthesis?

Benjamin Caulfield University of California, Berkeley
Aalto University
   Markus N. Rabe University of California, Berkeley
Aalto University
   Sanjit A. Seshia University of California, Berkeley
Aalto University
  
and Stavros Tripakis
University of California, Berkeley
Aalto University
Abstract

Syntax-guided synthesis (SyGuS) is a recently proposed framework for program synthesis problems. The SyGuS problem is to find an expression or program generated by a given grammar that meets a correctness specification. Correctness specifications are given as formulas in suitable logical theories, typically amongst those studied in satisfiability modulo theories (SMT).

In this work, we analyze the decidability of the SyGuS problem for different classes of grammars and correctness specifications. We prove that the SyGuS problem is undecidable for the theory of equality with uninterpreted functions (EUF). We identify a fragment of EUF, which we call regular-EUF, for which the SyGuS problem is decidable. We prove that this restricted problem is EXPTIME-complete and that the sets of solution expressions are precisely the regular tree languages. For theories that admit a unique, finite domain, we give a general algorithm to solve the SyGuS problem on tree grammars. Finite-domain theories include the bit-vector theory without concatenation. We prove SyGuS undecidable for a very simple bit-vector theory with concatenation, both for context-free grammars and for tree grammars. Finally, we give some additional results for linear arithmetic and bit-vector arithmetic along with a discussion of the implication of these results.

1 Introduction

Program synthesis is an area concerned with the automatic generation of a program from a high-level specification of correctness. The specification may either be total, e.g., in the form of a simple but unoptimized program, or partial, e.g., in the form of a logical formula or even a collection of test cases. Regardless, one can typically come up with a suitable logic in which to formally capture the class of specifications. Traditionally, program synthesis has been viewed as a deductive process, wherein a program is derived from the constructive proof of the theorem that states that for all inputs, there exists an output, such that the desired correctness specification holds [20], with no assumptions made about the syntactic form of the program. However, over the past decade, there has been a successful trend in synthesis in which, in addition to the correctness specification, one also supplies a hypothesis about the syntactic form of the desired program. Such a hypothesis can take many forms: partial programs with “holes” [21, 22], component libraries [16, 14], protocol scenarios [23, 1], etc. Moreover, the synthesis of verification artifacts, such as invariants [7], also makes use of “templates” constraining their syntactic structure. The intuition is that such syntactic restrictions on the form of the program reduce the search space for the synthesis algorithms, and thus speed up the overall synthesis or verification process.

Syntax-guided synthesis (SyGuS) [2] is a recently-proposed formalism that captures this trend as a new class of problems. More precisely, a SyGuS problem comprises a logical specification in a suitable logical theory that references one or more typed function symbols that must be synthesized, along with one or more formal languages of expressions of the same type as , with the goal of finding expressions such that when is replaced by in , the resulting formula is valid in . The formal language is typically given in the form of a grammar . Since the SyGuS definition was proposed about three years ago, it has been adopted by several groups as a unifying formalism for a class of synthesis efforts, with a standardized language (Synth-LIB) and an associated annual competition. However, the theoretical study of SyGuS is still in its infancy. Specifically, to our knowledge, there are no published results about the decidability or complexity of syntax-guided synthesis for specific logics and grammars.

Theory Grammar Class Regular Tree Context-free
Finite-Domain D U
Bit-Vectors U U
Arrays U U
EUF U U
Regular-EUF D ?
Table 1: Summary of main results, organized by background theories and classes of grammars. “U” denotes an undecidable SyGuS class, “D” denotes a decidable class, and “?” indicates that the decidability is currently unknown.

In this paper, we present a theoretical analysis of the syntax-guided synthesis problem. We analyze the decidability of the SyGuS problem for different classes of grammars and logics. For grammars, we consider arbitrary context-free grammars, tree grammars, and grammars specific to linear real arithmetic and linear integer arithmetic. For logics, we consider the major theories studied in satisfiability modulo theories (SMT) [5], including equality and uninterpreted functions (EUF), finite-precision bit-vectors (BV), and arrays – extensional or otherwise (AR), as well as theories with finite domains (FD). Our major results are as follows:

  • For EUF, we show that the SyGuS problem is undecidable over tree grammars. These results extend straightforwardly for the theory of arrays. (See Section 3.)

  • We present a fragment of EUF, called regular-EUF, for which the SyGuS problem is EXPTIME-complete given regular tree grammars. We prove that the sets of solution to regular-EUF problems are in one-to-one correspondence with the regular tree languages. (See Section 4.)

  • For arbitrary theories with finite domains (FD) defined in Section 5, we show that the SyGuS problem is decidable for tree grammars, but undecidable for arbitrary context-free grammars.

  • For BV, we show (perhaps surprisingly) that the SyGuS problem is undecidable for the classes of context-free grammars and tree grammars. (See Section 6.)

See Table 1 for a summary of our main results. In addition, we also consider certain restricted grammars specific to the theory of linear arithmetic over the reals and integers (LRA and LIA), as well as bit-vectors (BV) where the grammars generate arbitrary but well-formed expressions in those theories and discuss the decidability of the problem in Section 7. The paper concludes in Section 8 with a discussion of the results, their implications, and directions for future work.

2 Preliminaries

We review some key definitions and results used in the rest of the paper.

Terms and Substitutions

We follow the book by Baader and Nipkow [3]. A signature (or ranked alphabet) consists of a set of function symbols with an associated arity, a non-negative number indicating the number of arguments. For example consists of binary function symbol and constants and . For any arity , we let denote the set of function symbols with arity (the -ary symbols). We will refer to the -ary function symbols as constants.

For any signature and set of variables such that , we define the set of -terms over inductively as the smallest set satisfying:

  • For all , all , and all , we have .

We define the set of ground terms of to be the set (or short ). We define the subterms of a term recursively as , which we lift to sets of terms, . We say that a set of terms is subterm-closed if .

For a set of variables (or constants) and terms , the term is formed by replacing each instance of each in with . We call a substitution. Substitutions extend in the natural way to formulae, by applying the substitution to each term in the formula.

We extend substitution to function symbols with arity , where it is also called second-order substitution. For a function symbol of arity , a signature , and a fresh set of variables , a substitution to in is a term . Given a term , the term is formed by replacing each occurrence of any term in with (sometimes written ). We say that are the bound variables of . Intuitively, second-order substitution replaces not only by , but also replaces the arguments of each function application by the bound variables.

A context is a term in with a single occurrence of . For , we write for .

Logical Theories

A first-order model in , also called -model, is a pair consisting of a set called its domain and a mapping . The mapping assigns to each function symbol with arity a total function , and to each relation of arity a set .

A formula is a boolean combination of relations over terms. The mapping induced by a model defines a natural mapping of formulas to truth values, written (we also say satisfies ). For some set of first-order formulas, we say if for each . A theory is a set of formulas. We say is a model of if , and use to denote the set of models of . A first-order formula is valid in if for all , . A theory is complete if for all formulas either or is valid.

Given a set of ground equations and terms , we say that if there exists an in and a context such that and . For example, if , then . Let be the symmetric and transitive closure of . We will sometimes write instead of . We will use to represent the set . Birkhoff’s Theorem states that for any ranked alphabet , set and , if and only if for every model in such that it holds  [3].

In this work, we consider the common quantifier-free background theories of SMT solving: propositional logic (SAT), bit-vectors (BV), difference logic (DL), linear real arithmetic (LRA), linear integer (Presburger) arithmetic (LIA), the theory of arrays (AR), and the theory of uninterpreted functions with equality (EUF). For detailed definitions of these theories, see [5, 4].

For the theory of EUF it is common to introduce the If-Then-Else operator (ITE) as syntactic sugar [6, 5, 4]. We follow this tradition and allow EUF formulas to contain terms of the form , where is a formula, and and are terms. To desugar EUF formulas we introduce an additional constant and add two constrains and for each ITE term . As we will see in Section 3 the presence syntactic sugar such as the ITE operator in the grammar of SyGuS problems may have a surprising effect on the decidability of the SyGuS problem.

Grammars and Automata

A context-free grammar (CFG) is a tuple consisting of a finite set of nonterminal symbols with a distinguished start symbol , a finite set of terminal symbols, and a finite set of production rules, which are tuples of the form . Production rules indicate the allowed replacements of non-terminals by sequences over nonterminals and terminals. The language, , generated by a context-free grammar is the set of all sequences that contain only terminal symbols that can be derived from the start symbol using the production rules.

Tree grammars are a more restrictive class of grammars. They are defined relative to a ranked alphabet . A regular tree grammar consists of a set of non-terminals, a start symbol , a ranked alphabet , and a set of production rules. Production rules are of the form , where , is in and has arity , and each is in . For a given tree-grammar we write for the set of trees produced by . The regular tree languages are the languages produced by some regular tree grammar. Any regular tree grammar can be converted to a CFG by simply treating the right-hand side of any production as a string, rather than a tree. Thus, the undecidability results for SyGuS given regular tree grammars extend to undecidability results for SyGuS given CFGs.

Let be a signature of a background theory . We define a tree grammar to be -compatible (or -compatible) if and the arities for all symbols in match those in .

A (deterministic) bottom-up (or rational) tree automaton is a tuple . Here, is a set of states, , and is a ranked alphabet. The function maps a symbol and states to a new state , for all . If no such exists, is undefined. We can inductively extend to terms, where for all and all , we set . The language accepted by is the set . There exist transformations between regular tree grammars and rational tree automata [8], and we will sometimes also define SyGuS problems in terms of rational tree automata rather than a regular tree grammars.

Syntax-Guided Synthesis

We follow the definition of SyGuS given by Alur et al. [2], but we focus on the case to find a replacement for a single designated function symbol with a candidate expression (the program), which is generated by a given grammar . Let be a background theory over signature , and let be a class of grammars. Given a function symbol with arity , a formula over the signature , and a grammar of terms in , the SyGuS problem is to find a term such that the formula is valid or to determine the absence of such a term. We represent the SyGuS problem as the tuple .

The variables that may occur in the generated term stand for the arguments of . For each function application of the higher-order substitution then replaces by the arguments of the function application.

Note that the original definition of SyGuS allows for universally quantified variables, while our definition above admits no variables. This is equivalent as universally quantified variables can be replaced with fresh constants without affecting validity.

Example 1

Consider the following example SyGuS problem in linear integer arithmetic. Let the type of the function to synthesize be and let the specification be given by the logical formula

We can restrict the set of expressions to be expressions generated by the grammar below:

Term
Cond

It is easy to see that a function computing the maximum over and , such as , is a solution to the SyGuS problem. There are, however, other solutions, such as . The function computing the sum of and would satisfy the specification, but cannot be constructed in the grammar.

3 SyGuS-EUF is Undecidable

We use SyGuS-EUF to denote the class of SyGuS problems where is a grammar generating expressions that are syntactically well-formed expressions in EUF for . In this section, we prove that SyGuS-EUF is undecidable. The proof of undecidability is a reduction from the simultaneous rigid E-unification problem (SREU) [11]. We say that a set of equations between terms in together with an equation between terms in forms a rigid expression, denoted . A solution to is a substitution , such that and are ground for each and . Given a set of rigid equations, the SREU problem is to find a substitution that is a solution to each rigid equation in , and is known to be undecidable [11].

Reducing SREU to SyGuS-EUF. We start the reduction with constructing a boolean expression for a given set of rigid equations over alphabet and variables . Let each be , where , and are equations between terms in . We associate with each rigid expression a boolean expression , where is the substitution . The symbol is a unary function symbol to be synthesized and are fresh constants ( for all ). We set .

Next we give the grammar , which generates the terms that may replace in . We define to have the starting nonterminal and the following rules:

where is a fresh constant ( and for all ). Additionally, for each we add a rule , where the number of argument terms of matches its arity.

Lemma 1

The SREU problem has a solution if and only if the SyGuS-EUF problem has a solution over the ranked alphabet .

Proof

The main idea behind this proof is that each in represents the variable in . Any replacement to found in corresponds to a substitution on all variables in that grounds the equations in the SREU problem.

Let be a solution to , where each is a ground term in . We consider the term , which is in the language of the grammar . To show that is valid, it suffices to show that for each model of and for each we have . If , then holds trivially. We handle the remaining case below, giving justifications to the right of each new equation.

  1. Assume

  2. (1)

  3. For each : (2)

  4. For each : (3)

  5. (1)

  6. (4, 5)

  7. (def. SREU)

  8. (6,7, Birkhoff’s Thm.)

  9. (3,8)

Therefore, and we get that is a solution to the SyGuS problem .

Let and be defined as before and assume that is a solution to the SyGuS problem . Each in is ground, since the nonterminal in can only produce ground terms. Chose any . We will show for every model on , that if then . By Birkhoff’s theorem, this implies .

  1. Assume

  2. Let be a model over such that and assigns each to a distinct new element not in .

  3. (2)

  4. For each : (3)

  5. (1,2)

  6. (4,5)

  7. ( is a SyGuS solution)

  8. (6, 7)

  9. (3,8)

  10. (2,9)

Thus and is a solution to . ∎

Theorem 3.1

The SyGuS-EUF problem is undecidable.

Remark on EUF without ITE. A key step in the proof of Lemma 1 is the use of ITE statements to allow a single expression to encode instantiations of multiple different variables. As discussed in Section 2, ITE statements are commonly part of EUF, but some definitions of EUF do not allow for ITE statements [19]. While this syntactic sugar has no effect on the complexity of the validity of EUF formulas, the undecidability of SyGuS-EUF may depend on the availability of ITE operators. It remains open whether there exist alternative proofs of undecidability that do not rely on ITE statements.

We use SyGuS-Arrays to denote the class of SyGuS problems , where Arrays is the theory of arrays [5], and is a grammar such that are syntactically well-formed expressions in Arrays for . There is a standard construction for representing uninterpreted functions as read-only arrays [5]. Therefore, the undecidability of SyGuS-Arrays follows from the undecidability of SyGuS-EUF, as we state below.

Corollary 1

The SyGuS-Arrays problem is undecidable.

4 Regular SyGuS-EUF

This section describes a fragment of EUF, which we call regular-EUF, for which the SyGuS problem is decidable.

Definition 1

We call a regular SyGuS-EUF problem if contains no ITE expressions and is a regular-EUF formula as defined below.

A regular-EUF formula is a formula over some ranked alphabet , where each satisfies the following conditions:

  1. It is a disjunction of equations or the negation of equations.

  2. It does not contain any ITE expressions.

  3. It contains at most one occurrence of per equation.

  4. It satisfies one of the following cases:

    • Case 1: The symbol only occurs in positive equations.

    • Case 2: The symbol occurs in exactly one negative equation, and nowhere else.

We define any disjunction that satisfies the above conditions as regular. We will refer to a regular as case-1 or case-2, depending on which of the above cases is satisfied. Note that every regular-EUF formula is in conjunctive normal form.

We will show that for every regular , we can construct a regular tree automaton accepting precisely the solutions to the SyGuS-EUF problem on . The set of solutions to then becomes , where is the grammar of possible replacements. The grammar can be represented as a deterministic bottom-up tree automaton whose size is exponential in [8]. The product-automaton construction can be used to determine if is non-empty, which would imply that a solution exists to the corresponding SyGuS problem. This construction takes time and space. Note that this is at most exponential even when some of the automata have size exponential in or .

The connection between sets of ground equations and regular tree languages was first observed by Kozen [17], who showed that a language is regular if and only if there exist a set of ground equations and collection of ground terms such that . The following, very similar theorem shows that a certain set of equivalence classes of a ground equational theory can be represented by a regular tree automaton.

Theorem 4.1

Let be a set of ground equations over the alphabet , and let be a subterm-closed set of terms such that every term in is in . There exists a regular tree automaton without accepting states such that a state in represents an equivalence class of a term in . More formally, this means that for all terms such that there exist terms so that and , it holds that if and only if .

Proof

Let . For each term , for , let .

We define the function to operate on as follows: First, remove from . For all and such that , add to . If there already exists some such that , then .

Now for each in , call . A simple inductive argument will show that the resulting automaton is . ∎

Let be a regular formula. Let and . We can rewrite to the normal form . Solving the SyGuS problem for then becomes a problem of finding a such that for some . The technique to form the automaton that represents the solutions to depends on whether is case-1 or case-2.

Assume that is case-1 and chose some . Assume is not in . If , then is trivially solvable. If , then can be removed from to yield an equally solvable formula. Now assume is in . Without loss of generality, there is a context and a set of terms such that . Let and let be the automaton defined in the proof of theorem 4.1. For each , there is a ground term such that . Let be the set of states such that . By theorem 4.1, if and only if . Therefore, for any replacement, , of , if and only if .

Let be a tree automaton with accepting states . For each , let . For all , let . A simple inductive argument will show that for any replacement of , . Thus, defines the precise set of terms such that .

The set of solutions to can be given by the automaton whose language is . This can be found in time and space exponential in using the product construction for tree automata [8].

1

2

g

g

Figure 1: The automaton accepting the solutions to in example 2.
Example 2

Let . Note that this is a case-1 regular EUF clause. If we set and , then is the automaton from figure 1 (excluding the accepting state and transition). Since the argument of in is and parses to state-1, a transition from to state-1 is added to . Since parses to state-2 in , state-2 is set as an accepting state in . So accepts the replacements to such that is valid.

Assume is case-2 and let be the equation in that contains . Without loss of generality, there is a context and a set of terms such that . Let , and let . Choose some . If , then every replacement to is a solution. So assume Let is a replacement to such that and . Let . Assume is not -equivalent to any term in , let and let . We know has no outgoing edges: if it did, would be -equivalent to some term in . By construction, is equivalent to calling on . Since has no outgoing edges, calling on cannot induce any more merges. Therefore, since is not equal to , they are not equal after the merge. So, and thus are -equivalent to some terms in .

Let . For each , there is a ground term such that . Let be the set of states such that for each . Then for each replacement , if and only if .

Let . Let be a tree automaton with accepting states . For each , let . For all , let . A simple inductive argument will show that are precisely the solutions to .

1

3

2

g

h

g

1,2,3

g,h

Figure 2: Left: The set of solutions to in example 3. Right: The resulting automaton (without x transition and accepting state) after merging states 1 and 3.
Example 3

Let . Note that this is a case-2 regular-EUF clause. If we set and , then is the automaton from the left side of figure-2 (excluding the accepting state and transition). Since the argument of in is and parses to state-2, a transition from to state-2 is added to . If we choose a replacement such that parses to state-3 in , then applying the equation merges state-3 with state-1. This, in turn, forces a merge between the new state and state-2, yielding the automaton on the right side of figure-2. This automaton parses and to the same state, so state-3 is an accepting state. This does not occur if parses to state-1 or state-2 in , so they are not accepting states. So accepts the replacements to such that is valid.

We can summarize the above construction in the following lemma.

Lemma 2

The regular SyGuS-EUF problem is in EXPTIME.

The relationship between regular tree languages and the regular SyGuS-EUF problem is quite deep. Using the following lemma and the above constructions, we can see that a tree language is regular if and only if it is the set of solutions to a regular SyGuS-EUF problem.

Lemma 3

Let be a tree automaton. There exists a regular disjunctive formula such that is the set of solutions to .

Proof

Let be a subterm-closed set of terms such that for each state , there is a term such that . Without loss of generality, assume that each is a subterm of some term in . Let for some new constants . Let and . Finally set . Using the construction from theorem 4.1, it is easy to check that the set of solutions to are precisely . ∎

We can also use the above lemma to show that regular SyGuS-EUF is EXPTIME-complete, as we will see below.

Lemma 4

The regular SyGuS-EUF problem is EXPTIME-hard.

Proof

We reduce from the EXPTIME-complete problem of determining whether a set of regular tree automata have languages with a non-empty intersection [24]. Let , …, be a set of regular tree automata over some alphabet . For each automaton , construct the formula as described in lemma 3. Let . Let be a nullary function symbol to be synthesized, and let be a grammar such that . The solutions to the regular SyGuS-EUF problem are the members of the set . Therefore, has a solution if and only if is non-empty. ∎

Using the above lemma and lemma 2, we can conclude the following theorem.

Theorem 4.2

The regular SyGuS-EUF problem is EXPTIME-complete.

In concluding this section, we remark that the case-1 and case-2 restrictions on regular clauses are necessary. For lack of space, we exclude the details; the appendix contains an example elaborating on this point.

5 Finite-Domain Theories

In addition to the “standard” theories, we also consider a family of theories that we term finite-domain (FD) theories. Formally, an FD theory is a complete theory that admits one domain (up to isomorphism), and whose only domain is finite. For example, consider group axioms with a constant and the statements and . This is an FD theory, since, up to isomorphism, the only model of this theory is the integers with addition modulo 3. Also Boolean logic and the theory of fixed-length bit-vectors without concatenation are FD theories. Bit-vector theories with (unrestricted) concatenation allow us to construct arbitrarily many distinct constants and are thus not FD theories.

In this section we give a generic algorithm for any complete finite-domain theory for which validity is decidable. Let be a such a theory and let be a model of with a finite domain . Assume without loss of generality that for every element there is a constant in such that .

We consider a SyGuS problem with a correctness specification in theory , a function symbol to synthesize, and a tree grammar generating the set of candidate expressions. Let be the constants occurring in . The expression generated by to replace can be seen as a function mapping to an element in . If the domain of is finite there are only finitely many candidate functions, but it can be non-trivial to determine which functions can be generated by . In the following, we describe an algorithm that iteratively determines the set of functions that can be generated by each non-terminal in the grammar .

For each , we maintain a set of expressions . In each iteration and for each production rule for in , we consider the expressions where if is an expression (i.e. ) and if is a non-terminal . Given such an expression , we compute the function table, that is the result of for each , compare it to the function table of the expressions currently in . Our assumption of decidability of the validity problem for guarantees that this operation is decidable. If represents a new function, we add it to the set .

The algorithm terminates, after an iteration in which no set changed. As there are only finitely many functions from to and the sets grow monotonously, the algorithm eventually terminates. To determine the answer to the SyGuS problem, we then check whether there is an expression in , for which is valid.

Theorem 5.1

Let be a complete theory for which validity is decidable and which has a finite-domain model . The SyGuS problem for and -compatible tree grammars is decidable.

Iteration#
1 none
2 none
3 none
4
none
none
5 none none
none

Table 2: This table shows the expressions added to the sets , , and when we apply the algorithm to the SyGuS problem in Example 4. For readability, we simplify the expressions, indicated by the symbol ‘’. Expressions that are syntactically new, but do not represent a new function are struck out. When no new function is added, “none” is written in the cell.
Example 4

Consider the SyGuS problem over boolean expressions with the specification , where denotes the XOR operation and is the function symbol to synthesize from the following tree grammar (we use infix operators for readability):

The grammar generates boolean functions of variables and and the updates to , , and during each iteration of the proposed algorithm are given in Table 2. The next step in the algorithm is to determine if any of the three expressions make the formula valid, which is not the case.

6 Bit-Vectors

In this section, we show that the SyGuS problem for the theory of bit-vectors is undecidable - even when we restrict the problem to tree grammars. The proof makes use of the fact that we can construct (bit-)strings with the concatenation operation and can compare arbitrarily large strings with the equality operation. This enables us to encode problem of determining if the languages of CFGs with no -transitions have non-empty intersection, which is undecidable [15].

Theorem 6.1

The SyGuS problem for the theory of bit-vectors is undecidable for both the class of context-free grammars and the class of BV-compatible tree grammars.

Proof

We start with the proof for the class of context-free grammars. Given two context-free grammars and , we define a SyGuS problem with a single context-free grammar that has a solution iff the intersection of and is not empty. The proof idea is to express the intersection of the two grammars as the equality between two expressions, each generated by one of the grammars. The new grammar thus starts with the following production rule:

We then have to translate the grammars and into grammars and that produce expressions in the bit vector theory instead of arbitrary strings over their alphabets. There is a string produced by both and