HigherOrder Equational Pattern AntiUnification [Preprint]^{1}^{1}1This research is supported by the FWF project P28789N32.
Abstract
We consider antiunification for simply typed lambda terms in associative, commutative, and associativecommutative theories and develop a sound and complete algorithm which takes two lambda terms and computes their generalizations in the form of higherorder patterns. The problem is finitary: the minimal complete set of generalizations contains finitely many elements. We define the notion of optimal solution and investigate special fragments of the problem for which the optimal solution can be computed in linear or polynomial time.
David M. Cerna and Temur Kutsia \subjclassF.4.2 [Theory of Computation]: Mathematical Logic and Formal Languages—Grammars and Other Rewriting Systems, F.2.2 [Theory of Computation]: Analysis of Algorithms and Problem Complexity—Nonnumerical Algorithms and Problems.
1 Introduction
Antiunification algorithms aim at computing generalizations for given terms. A generalization of and is a term such that and are substitution instances of . Interesting generalizations are those that are least general (lggs). However, it is not always possible to have a unique least general generalization. In these cases the task is either to compute a minimal complete set of generalizations, or to impose restrictions so that uniqueness is guaranteed.
Antiunification, as considered in this paper, uses both of these ideas. The theory is simplytyped lambda calculus, where some function symbols may be associative, commutative, or associativecommutative. A,, C, and ACantiunification is finitary even for firstorder terms, and a modular algorithm has been proposed in [1] to compute the corresponding minimal complete set of generalizations. Antiunification for simply typed lambda terms can be restricted to compute generalizations in the form of Miller’s patterns [13], which makes it unitary, and the single least general generalization can be computed in linear time by the algorithm proposed in [8]. These two approaches combine nicely with each other when one wants to develop a higherorder equational antiunification algorithm, and we illustrate it in this paper. Basically, it extends the syntactic^{2}^{2}2We refer to the higherorder antiunification algorithm from [8] as syntactic, although it works modulo conversion. generalization rules from [8] by equational decomposition rules inspired by the those from [1], getting a modular algorithm in which different equational axioms for different function symbols can be combined automatically. The algorithm takes a pair of simply typed lambda terms and returns a set of their generalizations in the form of higherorder patterns. It is terminating, sound, and complete. However, the number of nondeterministic choices when decompositg may result in a large search tree. Although each branch can be developed in linear time, there can be too many of them to search efficiently.
This is the problem that we address in the second part of the paper. The idea is to use a greedy approach: introduce an optimality criterion, use it to select an antiunification problem among different alternatives obtained by a decomposition rule, and try to solve only that. In this way, we would only compute one generalization. Checking the criterion and selecting the right branch should be done “reasonably fast”. To implement this idea, we introduce conditions on the form of antiunification problems which are guarantee to compute “optimal” solutions, and study the corresponding complexities. In particular, we identify conditions for which A, C, and ACgeneralizations can be computed in linear time. We also study how the complexity changes by relaxing these conditions.
Higherorder antiunification has been investigated by various authors from different application perspective. Research has been focused mainly on the investigation of special classes for which the uniqueness of lgg is guaranteed. Some application areas include proof generalization [14], higherorder term indexing [15], cognitive modeling and analogical reasoning [9, 17], recursion scheme detection in functional programs [3], inductive synthesis of recursive functions [16], just to name a few. Two higherorder antiunification algorithms [6, 8] are included in an online opensource anti unification library [4, 5]. This related work does not consider antiunification with higherorder terms in the presence of equational axioms. However, such a combination can be useful, for instance, for developing indexing techniques for higherorder theorem provers [12] or in higher order program manipulation tools.
The organization of the paper is as follows: In Section 2 we introduce the main notions and define the problem. In Section 3 we recall the higherorder antiunification algorithm from [8]. In Section 4 we extend the algorithm with equational decomposition rules. Section 5 is devoted to the introduction of computationally wellbehaved fragments of antiunification problems. The next sections describe the behavior of equational antiunification algorithms on these fragments: In Section 6 we discuss associative generalization and speak about optimality. Sections 7 and 8 are about C and ACgeneralizations.
2 Preliminaries
This work builds upon the to be introduced background theory and the results of [7, 8]. Higherorder signatures are composed of types constructed from a set of base types (typically ) using the grammar . We will consider to associate right unless otherwise stated. Variables (typically ) as well as constants (typically ) are assigned types from the set of types constructed using the above grammar. terms (typically ) are constructed using the grammar where is a variable and is a constant, and are typed using the type construction mentioned above. Terms of the form , where is a constant or a variable, will be written as , and terms of the form as . We use as a shorthand for . This basic language will be extended by higherorder constants satisfying equational axioms. When necessary, we write a term together with its type as .
Every higherorder constant will have an associated set of axioms, denoted by . If is empty then does not have any associated properties and is called free. Otherwise, where is associativity, i.e. and is commutativity, i.e. . Note that only functions of the type are allowed to have equational properties. We assume that terms are written in flattened form, obtained by replacing all subterms of the form by , where . Also, by convention, the term stands for , if . Other standard notions of the simply typed calculus, like bound and free occurrences of variables, conversion, reduction, long normal form, etc. are defined as usual (see [2, 10]). By default, terms are assumed to be written in long normal form. Therefore, all terms have the form , where , is either a constant or a variable, have this form, and the term has a basic type.
The set of free variables of a term is denoted by . When we write an equality between two terms, we mean that they are equivalent modulo , and equivalence.
The size of a term , denoted , is defined recursively as and . The depth of a term , denoted is defined recursively as and . For a term with , its head is defined as .
A higherorder pattern is a term where, when written in long normal form, all free variable occurrences are applied to lists of pairwise distinct (long forms of) bound variables. For instance, , and are patterns, while , and are not.
Substitutions are finite sets of pairs where and have the same type and the ’s are pairwise distinct variables. They can be extended to type preserving functions from terms to terms as usual, avoiding variable capture. The notions of substitution domain and range are also standard and are denoted, respectively, by and .
We use postfix notation for substitution applications, writing instead of . As usual, the application affects only the free occurrences of variables from in . We write for , if . Similarly, for a set of terms , we define . The composition of and is written as juxtaposition and is defined as for all . Another standard operation, restriction of a substitution to a set of variables , is denoted by .
A substitution is more general than , written , if there exists such that for all . The strict part of this relation is denoted by . The relation is a partial order and generates the equivalence relation which we denote by . We overload by defining if there exists a substitution such that . The focus of this work is generalization in the presence of equational axioms thus we need a more general concept of ordering of substitutions/terms by their generality. We say that two terms are if they are equivalent modulo . For example, but, . Under this notion of equality we can say that a substitution is more general modulo an equational theory than written if there exists such that for all Note that and and the term extension are generalized accordingly. Form this point on we will use the ordering relation modulo an equational theory when discussing generalization.
A term is called a generalization or an antiinstance modulo an equational theory of two terms and if and . It is a higherorder pattern generalization if additionally is a higherorder pattern. It is the least general generalization (lgg in short), aka a most specific antiinstance, of and , if there is no generalization of and which satisfies .
An antiunification problem (shortly AUP) is a triple where

, , and are terms of the same type,

and are in long normal form, and

does not occur in and .
The variable is called a generalization variable. The term is called the generalization term. The variables that belong to , as well as bound variables, are written in the lower case letters . Originally free variables, including the generalization variables, are written with the capital letters . This notation intuitively corresponds to the usual convention about syntactically distinguishing bound and free variables. The size of a set of AUPs is defined as . Notice that the size of is not considered. An antiunifier of an AUP is a substitution such that and is a term which generalizes both and .
An antiunifier of is least general (or most specific) modulo an equational theory if there is no antiunifier of the same problem that satisfies . Obviously, if is a least general antiunifier of an AUP , then is a lgg of and .
Here we consider a variant of higherorder equational antiunification problem:
 Given:

Higherorder terms and of the same type in long normal form and an equational theory .
 Find:

A higherorder pattern generalization of and modulo .
Essentially, we are looking for which is least general among all higherorder patterns which generalize and (modulo ). There can still exist a term which is less general than , generalizes both and , but is not a higherorder pattern. In [8] there is an instance for syntactic antiunification: if and , then is a higherorder pattern, which is an lgg of and . However, the term , which is not a higherorder pattern, is less general than and generalizes and .
Another important distinguishing feature of higherorder pattern generalization modulo is that there may be more than one least general pattern generalization (lgpg) for a given pair of terms. In the syntactic case there is a unique lgpg. The main contribution of this paper is to find conditions on the AUPs under which there is a unique lgpg for equational cases, and introduce weakeroptimality conditions which allow one to greedily search the space for a less general generalization compared to the syntactic one. We formalize these concepts in the following sections.
3 Higher Order Pattern Generalization in the Empty Theory
Below we assume that in the AUPs of the form and the term is a higherorder pattern. We now introduce the rules for the higherorder pattern generalization algorithm from [8], which works for . It produces syntactic higherorder pattern generalizations in linear time and will play a key role in our optimality conditions introduced in later sections.
These rules work on triples , which are called states. Here is a set of AUPs of the form that are pending to antiunify, is a set of already solved AUPs (the store), and is a substitution (computed so far) mapping variables to patterns. The symbol denotes disjoint union.
where is a free constant or , and are fresh variables of the appropriate types.
where is a fresh variable of the appropriate type.
ce*0.5em [where t and s are of a basic type, or . The sequence is a subsequence of consisting of the variables that appear freely in or in , and is a fresh variable of the appropriate type.
Where is a bijection, extended as a substitution with and . Note that in the case of the equational theory we will consider later we would use instead of .
We will refer to these generalization rules as . To compute generalizations for two simply typed lambdaterms in long normal form and , the algorithm from [8] starts with the initial state , where is a fresh variable, and applies these rules as long as possible. The computed result is the instance of under the final substitution. It is the syntactic least general higherorder pattern generalization of and , and is computed in linear time in the size of the input.
We will use this linear time procedure in the following section to obtain “optimal” least general higherorder pattern generalizations of terms modulo an equation theory. These optimal generalizations are dependent on the generalizations the syntactic algorithm produces. When we need to check more than one decomposition of a given AUP in order to compute the optimal generalizations modulo an equational theory, we compute the optimal generalization for each decomposition path and than compare the results. The details are explained below.
We assume that terms are written in flattened form, obtained by replacing all subterms of the form by , where . Also, by convention, the term stands for , if .
4 Equational Decomposition Rules
In this section we discuss an extension of the basic rules concerning higherorder pattern generalization by decomposition rules for A, C, and AC function symbols. Here, we consider the general, unrestricted case. Efficient special fragments are discussed in the subsequent section.
We start from decomposition rules for associative generalization:
where , , , and and are fresh variables of appropriate types.
where , , , and and are fresh variables of appropriate types.
We refer to the extension of by the above associativity rules as and extend the termination, soundness and completeness results for to . {theorem}[Termination] The set of transformations is terminating.
Proof.
Termination follows from the fact that terminates [8] and the rules DecAL and DecAR can be applied finitely many times. ∎
[Soundness] If is a transformation sequence of , then is a higherorder pattern in long normal form and and .
Proof.
It was shown in [8] that is sound. Let us assume as a base case that all occurrences of associative function symbols in have two arguments. Then the rules DecAL and DecAR are equivalent to the rule. As an induction hypothesis (IH), assume soundness holds when all occurrences of associative function symbols in have arguments. We show that it holds for . Let be of the form for and let associative function symbols occurring in have at most arguments. Any application of DecAL or DecAR will produce two AUPs for which the IH holds, and thus, the theorem holds. We can extend this argument to an arbitrary number of associative function symbols with arguments with another induction. ∎
[Completeness] Let and be higherorder terms and be a higherorder pattern such that is a generalization of both and modulo associativity. Then there exists a transformation sequence in such that .
Proof.
We can reason similarly to the previous proof. It was shown in [8] that is complete. Let us assume as a base case that all occurrences of associative function symbols in have two arguments. Then the rules DecAL and DecAR are equivalent to the rule and completeness holds. When we have arguments there are ways to group the arguments associatively and the decompositions rules DecAL and DecAR allow one to consider all groupings. ∎
The addition of associative function symbols allows for more than one decomposition and thus more than one lgg in contrast to higherorder pattern generalization which results in a unique lgg . If we wish to compute the complete set of lggs we would simply exhaust all possible applications of the above rules. However, for most applications an “optimal” generalization is sufficient. We postpone discussion till the next section.
The decomposition rule for commutative symbols is also pretty intuitive:
where , , and and are fresh variables of appropriate types.
We refer to the extension of by the commutativity rule as . We can easily extend the termination, soundness, and completeness results to . Notice that also for commutative generalization, the lgg is not necessarily unique.
Unlike commutativity, which considers a fixed number of terms, and associativity, which enforces an ordering on terms, AC function symbols allow an arbitrary number of arguments with no fixed ordering on the terms. The corresponding decomposition rules take it into account:
where , , , , , and and are fresh variables of appropriate types.
where , , , , , and and are fresh variables of appropriate types.
We refer to the extension of by the AC decomposition rules as . Again, termination, soundness and completeness are easily extended to this case.
5 Towards Special Fragments
This section is devoted to computing special kind of “optimal” generalizations, which can be done more efficiently than the general unrestricted cases considered in the previous section.
The idea is the following: The equational decomposition rules introduce branching in the search space. Each branch can be developed in linear time, but there can be too many of them. However, if the branching factor is bounded, we could choose one of the alternative states (produced by decomposition) based on some “optimality” criterion, and develop only that branch. Such a greedy approach will give one “optimal” generalization.
In order to have a “reasonable” complexity, we should be able to choose such an optimal state from “reasonably” many alternatives in “reasonable” time. For this, our idea is to treat all the alternative states obtained by an equational decomposition step as syntactic antiunification problems, compute lggs for each of them (which can be done in linear time), choose the best one among those lggs (e.g., less general than the others, or, if there are several such results, use some heuristics), and restart equational antiunification algorithm from the state which led to the computation of that best syntactic lgg. When the branching factor is constant, this leads to a quadratic algorithm, and when it is linearly bounded, we get a cubic algorithm. These are the cases we consider below. We would also need to decompose in a more clever way than in the rules above, where the decomposition was based on an arbitrary choice of a subterm.
Hence, we need to identify fragments of equational antiunification problems which would have the decomposition branching factor constant or linearly bounded. We start by introducing the following concepts.
[refined generalization] Given two terms and and their generalizations and , we say that is at least as good as with respect to if either or they are not comparable with respect to .
An generalization of and is called their refined generalization iff is at least as good (with respect to ) as a syntactic lgg of and .
Note that every syntactic generalization is also an generalization. A direct consequence of this definition is that every element of the minimal complete set of generalizations (where is A, C, or AC) of two terms is an refined generalization of and . However, there might exist refined generalizations which do not belong to the minimal complete set of generalizations.
Looking back at the informal description of the construction above, we can say that at each branching point we will be aiming at choosing the alternative that would lead to “the best” refined generalization.
The concept of refined allows us to compute better generalizations than the base procedure would do, without concerning ourselves with certain difficult to handle decompositions. We will outline what we mean by “difficult” in later sections. Some of these difficult decompositions can be handled by finding alignments between two sequences of terms.
[Alignment, Rigidity Function] Let and be strings of symbols. Then the sequence , for and are not variables, is an alignment if

’s and ’s are integers such that and , and

, for all . An alignment of the form will be referred to as a singleton alignment
The set of all alignments will be denoted by A. A (singleton) rigidity function is a function that returns, for every pair of strings of symbols and , a set of (singleton) alignments of and .
[Pair of argument head sequences and multisets] Let and . Then the pair of argument head sequences and the pair of argument head multisets of and , denoted respectively as and , are defined as follows:
These notions extend to AUPs: A pair of argument head sequences (resp. multisets) of an AUP is the pair of argument head sequences (resp. multisets) of the terms and .
There is a subset of AUPs, referred to as Determined AUPs, which contain associative function symbols and have an interesting refined generalizations are computable in linear time. The more general determined AUPs allow a bounded number of possible choices, that is choices, whenever associative decomposition may be applied. Even for determined AUPs computing the set of lggs is of exponential complexity. Therefore, we introduce the notion of optimal generalization where is a so called rigidity function [11] and is a choice function picking one of available decompositions. Under such optimality conditions, we are able to compute an refined generalization in quadratic time for determined AUPs and in cubic time for arbitrary AUPs with associative function symbols.
The equational decomposition rules above are too nondeterministic and the computed set of generalizations has to be minimized to obtain minimal complete sets of generalizations. However, even if we performed more guided decompositions, obtaining e.g., terms with the same head in new AUPs (as in [11]), there would still be alternatives. For instance, consider the following AUP where is associative: Now let , , and for every other term comparison whose index is the head symbols are not equivalent. Under these assumptions there is not enough information to decide which decomposition is less general. Furthermore, this can be generalized from two possible decompositions to possibilities.
Under certain conditions we can force a term to have a single decomposition path, what we will refer to as a determined condition which is equivalent to unique longest common subsequence of head symbols. We formally define determined AUPs using the following sequence of definitions:
[determinate set] Given the pair of sequences of symbols with and , and a positive integer , the (strict) determinate set of and , denoted (), is defined as follows:

If and or vice versa, then .

Otherwise, let be a number such that for the multiset we have . Let () be the set of pairs ( ). If has at most elements, then

Otherwise, .
Note that is defined analogously using instead of . We will refer to the pairs where is a singleton alignment and a determinate set as blocks.
We will use when considering commutativity in Section 7 . {example} We illustrate the previous definition:

.

.

.





.

.

.

.

.
Even though are related the formalism does not handle them as similar. This merely makes the formalism a little more restricted. Notice that a unique longest common subsequence of two symbol sequences is not equivalent to kdetermined. Consider the following example:
The alignment representing its longest common subsequence is
[determined term pairs] A pair of terms is determined iff either or and

, or

and , or
Furthermore we say that the pair is total determined if , and or for each where is the term at the position of and is the term at the position of the term pair is total determined.
Proposition 1.
The complexity of checking if the terms of an AUP is determined is and total determined is , where is maximum of the length of the two terms.
Checking determinedness of an AUP is a harder problem complexitywise. For example, given the sequences and there are ways to align the terms which have to be checked. Moreover, if we want to check total determinedness we have to again do a quadratic check for each pair of aligned terms resulting in an procedure.
6 Associative Generalization: Special Fragments and Optimality
6.1 Associativity and Determined AUPs
We provide a linear time algorithm for higherorder refined pattern generalization of AUPs which are determined. Essentially, at every step there is a single decomposition choice which can be made. {theorem} A higherorder refined pattern generalizer for a total determined AUP can be computed in linear time.
Proof.
If the AUP does not contain an associative function symbol, then its refined generalization, which is also an lgg, can be computed in linear time [8]. If it does contain an associative function symbol, we have two alternatives: either every occurrence of the associative function symbol has two arguments (remember that our terms are in flattened form), or not. In the former case, the associative decomposition rules do not differ from the syntactic decomposition rule Dec and we can only apply the latter. It means that we can still use the linear algorithm from [8]. The rest of the proof is about the case when there are occurrences of associative function symbols with more than two arguments. The proof goes by induction on the maximal number of such arguments.
We assume for the induction hypothesis that if every instance of the associative function symbol in the AUP has at most arguments, then it is solvable in linear time, and show that the same holds for . Let us assume that the AUP we are currently considering has the following form where is associative and . Assume without loss of generality that . Also, assume that no other occurrence of in the given AUP has more than arguments. We make this assumption in order to reduce the complexity of associative decomposition in the AUP and thus, apply the induction hypothesis. If ,then their lgg should not be a variable. Therefore, we can apply DecAL, which results in the AUPs (whose further decomposition will make sure that they and are not generalized by a generalization variable) and . Notice that both of the resulting AUPs, by our assumptions, only contain with not more than arguments. Thus, by the induction hypothesis the theorem holds in this case.
For the next step we assume and are the terms of the AUP and that s.t. . Therefore, we can perform DecAL only on the first argument times, which gives the following new AUPs: . All the resulting AUPs, by our assumptions, only contain with not more than arguments, thus by the induction hypothesis the theorem holds in this case.
For the next step we assume and are the terms of the AUP and that s.t. and . This is similar to the previous case except there is more than one possible way to apply associative decomposition. More precisely, the number of possible ways is where
which is roughly . However, being that none of the head symbols of obtained termpairs are equivalent nor can their head symbols be equivalent to , we know that none of the resulting AUPs will require further decomposition. Thus, we need to apply associative decomposition. This can be easily performed be performed by some heuristic. The result will be a set of AUPs containing and thus by the induction hypothesis and our assumptions, the theorem holds in this case.
For the final step we just need to apply a simple induction argument on the number of times in a term the associative symbol occurs with arity . The above argument provides the step case and base case being that we prove the theorem for one occurrence and can use the proof for occurrences. Thus, the theorem holds. ∎
In the next section we consider AUPs which are determined for . This will require us introducing a new concept of optimality based on a choice function greedily applied during decomposition.
6.2 Choice Functions and Optimality
In this section procedures and optimality conditions for total determined AUPs, for , that is AUPs where there are at most ways to apply equational decomposition.
If we were to compute the set of refined generalizations for a total determined AUP by testing every decomposition, even for the size of search space is too large to deal with efficiently. However, we can find a optimal refined generalization (precisely defined below) in quadratic time, where is a singleton rigidity function, a choice function, is a set of state transformation rules. Essentially, optimality means the choice function chooses the “right” computation path via based on the singleton rigidity function . The effect is that we reduce the problem of total determined AUPs to the case of total determined AUPs with the additional complexity of computing the choice function at each step. We will provide a choice function with linear time complexity based on the procedure for .
We will denote the set of all AUPs by . We will need the concept for the following definitions.
[decomposition] Let , is an alignment of . An decomposition of is , where are new variables of appropriate type and are bound variables from , which appear in . {definition}[feasible] Let be a state s.t. where , be an alignment of and