Reducing Transducer Equivalence to Register Automata Problems solved by “Hilbert Method”

Reducing Transducer Equivalence to Register Automata Problems solved by “Hilbert Method”

Adrien Boiret Radosław Piórkowski Janusz Schmude
Abstract

In the past decades, classical results from algebra, including Hilbert’s Basis Theorem, had various applications in formal languages, including a proof of the Ehrenfeucht Conjecture, decidability of HDT0L sequence equivalence, and decidability of the equivalence problem for functional tree-to-string transducers.

In this paper, we study the scope of the algebraic methods mentioned above, particularily as applied to the functionality problem for register automata, and equivalence for functional register automata. We provide two results, one positive, one negative. The positive result is that functionality and equivalence are decidable for MSO transformations on unordered forests. The negative result comes from a try to extend this method to decide functionality and equivalence on macro tree transducers. We reduce macro tree transducers equivalence to an equivalence problem for some class of register automata naturally relevant to our method. We then prove this latter problem to be undecidable.

affiliationtext: University of Warsaw

1 Introduction

The study of finite-state machines, such as transducers [15, 8, 14] or register automata [2, 3], and of logic specifications, such as MSO-definable transformations [9], provides a theoretical ground to study document and data processing.

In this paper, we will consider the equivalence problem of functional transducers. We focus on register automata, i.e. transducers that store values in a finite number of registers that can be updated or combined after reading an input symbol. Streaming String Transducers (SST) [2] and Streaming Tree Transducers (STT) [3] are classes of register automata (see for example [4]) where the equivalence is decidable for the copyless restriction, i.e. the case where each register update cannot use the same register twice. This restriction makes SST equivalent to MSO-definable string transformations. Macro tree transducers (MTT) [11], an expressive class of tree transducers for which equivalence decidability remains a challenging open problem, can be seen as register automata, whose registers store tree contexts. Although equivalence is not known to be decidable for the whole class, there exists a linear size increase fragment of decidable equivalence, that is equivalent to MSO-definable tree transformations, and can be characterized by a restriction on MTT quite close to copyless [10].

Some equivalence decidability results have been proven on register automata without copyless restrictions [16, 5], by reducing to algebraic problems such as ideal inclusion and by applying Hilbert’s Basis Theorem and other classical results of algebraic geometry. In this paper we will refer to this as the “Hilbert Method”. This method was used to prove diverse results, dating back to at least the proof of the Ehrenfeucht Conjecture [1], and the sequence problem for HDT0L [13, 12]. It has recently found new applications in formal languages; for example, equivalence was proven decidable for general tree-to-string transducers by seeing them as copyful register automata on words [16].

In this paper, we use an abstraction of these previous applications of the “Hilbert Method” as presented in [6]. We apply these preexisting results to the study of unordered forest transductions – and notably MSO functions. Note that equivalence of MSO-definable transductions on unordered forests is not a straightforward corollary of the ordered case, as the loss of order makes equivalence more difficult to identify. We also try to apply those methods to obtain decidability of MTT equivalence. For unordered forests, we obtain a positive result, showing that register automata on forest contexts with one hole have decidable functionality and equivalence. For the attempt to study MTT, we prove an undecidability result on register automata using polynomials and composition, which means the natural extension of this approach does not yield a definitive answer for the decidability of MTT equivalence.

Layout

Section 2 presents the notions of algebra, register automata, and the notions necessary to use an abstraction of the “Hilbert Method” as presented in [6]. Section 3 is dedicated to the proof of the positive result that we can apply the “Hilbert Method” to contexts of unordered forests with at most one hole (i.e. the algebra of unordered forests with limited substitution). This provides a class of register automata encompassing MSO functions on unordered forests where functionality is decidable. Finally, Section 4 describes how applying a method similar as in Section 3 to study MTT equivalence leads to studying register automata on the algebra of polynomials with the substitution operation, a class whose functionality and equivalence we prove to be undecidable.

2 Preliminaries

Algebras. An algebra is a (potentially infinite) set of elements , and a finite number of operations . Each operation is a function for some .

Polynomials. For an algebra and a set of variables, we note the set of terms over . A polynomial function of is a function . For example on , the term induces the polynomial function . The definition of polynomial functions can be extended to functions by product of their output: if are polynomial functions from to , then is a polynomial function from to . Note that polynomial functions are closed under composition.

One can define the algebra of polynomials over with variable set , denoted . Its elements are equivalence classes of terms over with the operations of , where two terms are called equivalent if they induce identical polynomial functions. can be seen as an algebra that subsumes , with natural definition of operations. A classical example of this construction is the ring of polynomials , obtained from the ring .

By adding the substitution operation to , we get a new algebra called a composition algebra of polynomials and denoted . Homomorphisms of such algebras are called composition homomorphisms. For brevity we write for the substitution of a single . Examples of such algebras include well-nested words with a placeholder symbol “”, as used in the registers of Streaming Tree Transducers [3], or tree contexts with variables in their leaves, as used in Macro Tree Transducers [11].

Simulation. Following the abstractions as they are presented in [6] 111As of this version’s redaction, Part 11 of [6] is the part relevant for this paper. This and any theorem or page number can change in future versions of [6], we define simulations between algebras in a way that is relevant to the use of the “Hilbert Method”.

Definition 1.

Let and be algebras. We say that is a simulation of in if for every operation of , there is a polynomial function of such that , where is defined from to coordinate-wise. If such a simulation exists, we say that is simulated by ().

The following lemma states that simulations extend to composition algebras.

Lemma 2.

Let . If and is an infinite set, then .

Proof.

If there is a simulation from to , then can be extended into a simulation from to by setting , and requiring to be a homomorphism. It is important to check that indeed is a function (i.e. preserves equivalence of terms): if terms induce the same functions on , then and are polynomial functions that are equal on . Since is an infinite subset of , and are equal everywhere. The proof of injectivity is straightforward. Let be any two nonequivalent terms. Then there is a tuple of such that . If , we would have . This would contradict the injectivity of on . ∎

Typed Algebras. Some of the algebras we consider are multi-sorted, which is to say that their elements are divided between a finite number of types. A multi-sorted algebra is an algebra such that:

  • can be partitioned into ,

  • each operation is a function .

To each we associate a type, which is a unique such that . Note that in a multi-sorted algebra polynomial functions are typed , and simulation and substitutions must be defined type-wise.

Register automata. In this paper we will work on register automata that make a single bottom-up pass on an input ranked tree, use a finite set of states, and a finite set of registers with values in for some algebra . When the automaton reads an input symbol, it updates its register values as a polynomial function of applied to the register values in its subtrees. This formalism is already present in the literature: streaming tree transducers [2], for example, are register automata on input words and register values in the algebra of words on an alphabet , with the concatenation operation.

A signature is a finite set of symbols , each with a corresponding finite rank . A ranked tree is a term on this signature : if , , and are trees, then is a tree.

Definition 3.

Let be an algebra. A bottom-up register automaton with values in (or -RA) is a tuple , where:

  • is a ranked set

  • is the number of -registers used by

  • is a finite set of states

  • is a finite set of transitions of form where of rank , , and a polynomial function of .

  • is a partial output function that to some states associates a polynomial function of .

A configuration of is a -uple where is a state and is a -uple of register values in . We define by induction the fact that a tree can reach a configuration , noted : If of rank , a rule of , and for , , then

determines a relation from trees to values in . It is defined using as a final step: if , and , then .

We say that a -RA is functional if is a function. We say that a -RA is deterministic if for all there is at most one rule in . Any deterministic -RA is functional.

Note that on a multi-sorted algebra, we further impose that every state has a certain type , i.e. if is a configuration of , then .

“Hilbert Method”. We now describe an abstraction [6] of the classical algebra methods that are used in the literature [16, 5] to decide equivalence of functional register automata over certain algebras (e.g. ) using what we will refer to as the “Hilbert Method”. More specifically, as it is always easy to prove the semi-decidability of non-equivalence of functional -RA, by guessing two runs on the same input with different outputs, this method aims to prove that the functionality and equivalence problems over functional -RA are semi-decidable.

This method can be described as a 4-step process:

  • Simulate by , hence reducing -RA equivalence to -RA equivalence

  • Functional -RA equivalence can be reduced to -RA zeroness, i.e. checking if a -RA only outputs 0.

  • -RA zeroness can be reduced to ideal inclusion problem in , i.e. the ring of polynomials with algebraic numbers as coefficients

  • Ideal inclusion problem in is decidable

These results exist in the literature. We will provide references as well as an intuition of the main mechanisms in these proofs. For the first point, the reduction from -RA equivalence to -RA equivalence, an example is provided in [16], where is . In essence, this part of the method amounts to a simulation as described in Definition 1. If with simulation , then any -RA can be simulated by a -RA , in the sense that outputs for an input if and only outputs for the same input . This gives a reduction from -RA equivalence to -RA equivalence.

The second point is presented in the proof of Theorem 11.8 of [6]. If and are two functional -RA of same domain, using a natural product construction, one can create that runs and in parallel, then computes the difference of outputs between and . Thus and are equivalent iff only outputs 0.

The third point can be found in the proof of Theorem 11.8 of [6]. The idea is to express -RA zeroness as a set problem (with polynomial grammars as an intermediary in [6]). We want to find for each state the set of register values that can hold in state . These states obey to some inclusion equations: if is a rule of , then . Furthermore, if zeroness is true for , then for every such that final output function is defined, . Interestingly, if such a family of sets of exists to satisfy those inclusions, then there exists a family of ideal sets of , , that also satisfy those inclusions (Lemma 11.5 of [6]).

The fourth point uses classical algebra results to find such a family of ideal sets. The proof, as it is presented in Theorem 11.3 of [6], works as follows: Hilbert’s Basis Theorem ensures that all families of ideals can be enumerated. For each of these families, it can be checked using Groebner Basis whether it respects a set of inclusions or not. Eventually, if a solution exists, it will be found, making this ideal problem, and thus -RA zeroness, semi-decidable.

For this paper, we point out a few natural extensions to those methods and establish Theorem 4 and Corollary 5 as a basis for our work.

The first remark is that one can consider more problems than functional equivalence. Functionality itself can be studied with these methods. It is preserved by simulations, and -RA functionality can be reduced to zeroness: instead of comparing two functional -RA in the second point, can run two copies of the same -RA and compute the output difference. is functional iff only outputs 0.

The second remark is that the classical algebra results (Hilbert Basis Theorem, Groebner Basis, algebraic closure of a field…) used in the fourth point extend to any computable field . In consequence, Theorem 11.8 of [6] holds for any -RA. Since the polynomial ring is a subring of a computable field (rational functions over ), it holds for -RA as well. We therefore state the following theorem.

Theorem 4.

Let . Functionality of -RA and equivalence of functional -RA are decidable.

The result of Theorem 4 can be extended to other algebras using simulations from algebra to algebra. Indeed, if , then any -RA can be simulated by a -RA, and problems of functionality and equivalence reduce from -RA to -RA.

Corollary 5.

Let be an algebra. If , then functionality of -RA and equivalence of functional -RA are decidable.

3 Unordered forests are simulated by polynomials

In this section we will show that the unordered tree forests (and more generally – the unordered forest algebra [7] that contains both forests and contexts with one hole) can be simulated in the sense of Definition 1 by polynomials with rational coefficients over a variable (noted ) with the operations . This, combined with Corollary 5, implies the decidability of functionality and equivalence for a class of Forests-RA. We then prove that this class can express all MSO-transformations on unordered forests.

An unordered tree on a finite signature is an unranked tree (i.e. every node can have arbitrarily many children), but the children of a node form an unordered multiset, rather than an ordered list. For example, the following figure displays two representations of the same unordered tree. An unordered forest is a multiset of unordered trees.

b

b

Figure 1: Two representations of the same unordered tree

Unordered forests can thus be defined as an algebra :

  1. is the set of unordered forests, including – the empty forest;

  2. the operations are:

    • binary operation is the multiset addition,

    • for each letter , unary operation : if , then .

In the rest of this paper, we will reason with a unary signature (and thus a unique operation). This is done without loss of generality, as unordered forests on a finite signature can easily be encoded by forests on a unary signature. To express it as a polynomial simulation, we can say that , and that for all

3.1 Encoding forests into polynomials

This subsection’s aim is to prove the following result:

Proposition 6.

is simulated by .

To this end we construct an injective homomorphism . This associates injectively to each forest a rational polynomial . It is important to check that two identical forests with different representations (as in Figure 1) will not obtain different value by . Furthermore, the operations must be encoded as , two polynomial functions in , such that and .

Note that the term “polynomial” suffers here from semantic overload. We will take care to differentiate, on one hand, rational polynomials (i.e. the elements of , e.g. ), denoted by variants on letters , and on the other hand, polynomial functions on the algebra (e.g. ), denoted by variants on the letter .

Since in UF is both associative and commutative, we choose to be multiplication between rational polynomials: . This leaves to encode. To ensure that is injective, we would like to pick so that sends all to pairwise different irreducible polynomials. This is done by picking and using the Eisenstein’s criterion with prime number 2: if a monic polynomial has all its nonleading coefficients divisible by 2, and the constant coefficient not divisible by 4, then this polynomial is irreducible over . From there we define inductively: , , and . It is clear that respects the condition of polynomial simulation that any operation of UF must be encoded as polynomial operation in .

This leads directly to the proof of Proposition 6: is a simulation from UF to as defined in Definition 1. It is injective, the operation is encoded by the polynomial function , and the operation is encoded by the polynomial function .

3.2 Extension to contexts

The combination of Corollary 5 and Proposition 6 gives decidability results on the class of UF-RA. The transducers of this class read a ranked input, and manipulate registers with values in . As an example, an UF-RA can read a binary input, and output the unordered forests that it encodes in a “First Child Next Sibling” manner, that is to say the left child in the input corresponds to the child in the output, and the right child in the input corresponds to the brother in the output. Note that this is an adaptation of classical FCNS encoding of unranked ordered trees in binary trees, but where the order is forgotten.

b

Figure 2: “FCNS” decoding

This can be described by a one-state one-register UF-RA that uses rules of form

However, UF-RA have their restriction: since and are the only two operations allowed, registers can only store subtrees to be placed at the bottom of the output. This leaves the class without the ability to combine subtrees of its output as freely as the MSO logic does. As an example, it is impossible to create an UF-RA that, if given an input where and are two unary subtrees, outputs the subtree above the subtree as shown in Figure 3.

Figure 3: Subtree concatenation

To get a more general class of register automata, that can perform such superpositions, we need to allow registers to store contexts, rather than forests. While the use of the Hilbert Methods for algebras of general contexts remains a difficult and interesting open problem, we will show that forest contexts with at most one hole are simulated by polynomials of .

We use the unordered version of 2-sorted Forest Algebra [7], consisting of unordered forests of trees and contexts with at most one hole. Since the previous subsection deals with an algebra of forests, to avoid confusion, we will call this the Unordered Context and Forest algebra (noted UCF). Using the definition of composition algebras, UCF is a subset of , where we impose that the replacable variable occurs at most once.

On this algebra, we will show the following results:

Theorem 7.

UCF is simulated by .

Corollary 8.

Functionality of UCF-RA and equivalence of functional UCF-RA are decidable.

Lemma 2 ensures that since , then , i.e. where only can be substituted. UCF is the restriction of to its elements with at most one occurrence of . This forms a 2-sorted algebra. We consider its natural match in : Let be the 2-sorted algebra:

  • The universe is .

  • The types are .

  • The operations are:

    • multiplication, defined only on pairs of types: ,

Lemma 9.

UCF is simulated by .

Proof.

We call the homomorphism obtained by extending last subsection’s with mapping the substitution variable to . We restrict to terms with at most one occurrence of . The image of will then be a term of with at most one occurrence of . If never appears in , then never appears in , thus . If appears once in , then appears once in , thus . ∎

We now prove that is simulated by without substitution.

Lemma 10.

is simulated by .

Proof.

We will use encoding of in given by Provided this, we encode operations in a straightforward manner. For example, for the composition operation in , we see that is equal to . Hence, in pairs of [x], is encoded by

Since is a transitive relation, Lemma 10 and Lemma 9 give Theorem 7. Once Theorem 7 is proven, Corollary 5 gives Corollary 8.

Note that this proof extends to contexts with a bounded number of holes. We can add substitution variables to UF. Lemma 2 gives a homomorphism that ensures . One could then define contexts with at most occurrences of variables . In a manner similar to Lemma 9, we can find a finitely-sorted algebra that contains , i.e. an algebra of all polynomials of with a degree regarding the variables . Then, in a manner similar to Lemma 10, we can show that finite degree composition can be encoded in .

Corollary 11.

is simulated by . Functionality of -RA and equivalence of functional -RA are decidable.

3.3 Encompassing of MSO

Corollary 8 gives decidability results on the class of UCF-RA. We motivated this class as a relevant extension of UF-RA by exhibiting a transformation (see Figure 3) that required contexts to be expressed. However, this class is not immediately relevant in its properties or expressiveness. In this section, we prove that UCF-RA can express strictly more than all MSO-definable transformations on unordered trees. Note that UCF-RA define functions from binary ordered trees to , not from to . We say that an UCF-RA expresses a function if for a binary tree that is the “FCNS” encoding of a forest , its image for the tree is .

We briefly present a definition of MSO formulae and transformations. More complete definitions exist elsewhere in the literature (e.g. [9]).

The syntax of monadic second order logic (MSO) is:

where lower cases are node variables, and upper cases are set variables. This syntax is enriched by different relations to describe the structure of the objects we consider:

  • For binary trees (BT), we add two relations and that express that is the left child (resp. right child) of .

  • For unranked ordered forests (OF), we add , that expresses that is the first child of , and that express that is the brother directly to the right of .

  • For unranked unordered forests (UF), we only add the relation , that expresses that is a child of . The relation “Sibling” would only be syntactic sugar.

An MSO-definable transformation with copies is a transformation that for each input node , makes output nodes . The presence or absence of an edge in the output are dictated by formulae defining the transformation. A MSO-definable transformation is characterized by its formulae for each , and each structure relation (e.g. and if the output is ordered forests).

For example, if one wanted to reverse left and right children in binary trees, this would be a transformation definable in with one copy, where , i.e.  is ’s left child in the output iff was ’s right child in the input, and conversely .

We note that this definition can express transformations between any two tree algebras. For example, the “FCNS” decoding of Figure 2 can be encoded in MSO from binary trees to . Since we will use different combinations of input-output in this part, we introduce the notation to denote MSO from one type of trees to the other. designs MSO-definable functions from binary trees to ordered forests, and designs MSO-definable functions from unordered forests to unordered forests.

Proposition 12.

Every function of can be described by an UCF-RA.

The proof we provide to show this Proposition has three arguments:

  1. can be represented by functions of .

  2. Bottom-UP Streaming Tree Transducers (STT) [3] describe all functions of .

  3. Bottom-Up STT can be expressed as register automata.

From to . We say that a binary tree represents an unordered forest if the “FCNS” decoding of as represented in Figure 2 is . Note that is not unique for , but every represents a unique . Similarily, we can say that an unranked ordered forest represents an unordered forest if by forgetting the siblings’ order in , we get . Once again such an is not unique for , but every represents a unique . We can extend this notion to MSO transformation.

Definition 13.

A function represents a function if:

  • For every unordered forest such that is defined over , then there exists at least one binary tree such that represents , and is defined over

  • For every binary tree such that is defined over , is defined over the unordered forests such that represents , and represents .

Once again such an is not unique for , but every represents a unique . Furthermore, it is always possible to find a representant for an MSO-definable function .

Lemma 14.

If , then there exists that represents .

Proof.

We start by encoding the input, transforming into a function of . To modify so that it transforms trees that represent into , one has to replace every occurence of into by its “FCNS” encoding, i.e. .

Encoding the output requires to change into two relations and , i.e. to artificially order the siblings of the output forest. To that effect, we note that from the BT-formula , one can describe in BT-MSO a set . Since is a set of input nodes of a binary tree, it is totally ordered by their occurrence in the infix run. This order can be expressed as a BT-MSO relation. We can then decide to order all the children of the output node .

To find the first child of a node in the output, we say that if is the first index where and is its first element. Similarily, to find the next sibling of a node in the output, we say that if , and either and are consecutive elements of , or is the last element of , is the first element of , and is the first index bigger than such that . ∎

From to STT to RA. The next step is to use an existing result from the literature [3] that describes a model of transducers that describes all . The formalism in question are Streaming Tree Transducers (STT). A STT is an automaton on nested words (words representing trees) that maintains a stack of register configurations. The nesting of the words dictates how this stack behaves: each opening letter stores the current variable values in the stack to start with fresh ones, then each closing letter uses the current variable values and the top of the stack to generate new values for the registers.

In [3], STT are limited to linear functions for the update of their value. Furthermore, the paper proves that without loss of expression, one can consider Bottom-Up STT, where reading an opening symbol resets the state as well as the registers. On such STT, the behavior of a STT reading the nested word of a subtree does not depend on what occurs before or after, and its computation behaves like a register automaton reading a tree in a bottom-up manner. We will consider the class of Bottom-Up STT that read nested representations of binary trees (Bottom-Up BT STT).

Proposition 15.

Every function of is described by a Bottom-Up BT STT.

Proposition 16.

Every function of a Bottom-Up BT STT is described by an OCF-RA.

Proposition 15 comes directly from Theorems 3.7 and 4.6 of [3]: 3.7 explains Bottom-Up BT-STT are as powerful as general BT-STT, and 4.6 states that STT can describe any function of . Proposition 16 is not directly proven in [3] but their definition of Bottom-Up STT is made specifically to that end. We provide more details in the appendix.

End of Proof. To turn that OCF-RA into an UCF-RA, we just have to change the ordered concatenation of OCF to the unordered concatenation of UCF. By combining Lemma 14, Proposition 15 and Proposition 16, we conclude our proof of Proposition 12.

We note that every MSO-definable function can be described by a UCF-RA, however the converse is not true; consider a function that creates output of exponential size (whereas MSO can only describe functions of linear size increase). Consider unary input trees of form , and a 1-counter UCF-RA with rules , and . The image doubles in size each time a symbol is read. Unsurprisingly, this counterexample uses the copyful nature of UCF-RA, as copyless restrictions tend to limit the expressivity power of register automata to MSO classes [2, 3].

4 On decidability of MTT equivalence. Equivalence of polynomials-RA with composition is undecidable

In this section, we tru to use the “Hilbert Method” to study the equivalence problem on Macro Tree Transducers (MTT) [11]. MTT have numerous definitions. For this paper, we will consider them to be register automata on an algebra of ranked trees with an operation of substitution on the leaves; observe this is exactly -RA. The algebra (ranked trees without substitution on the leaves) can be simulated by words with concatenation (via nested word encoding). Words with concatenation can be encoded by (see, for example, the proof of Corollary 10.11 [6]). Thus, . Finally, by Lemma 2, we have that . This means that if equivalence is decidable for -RA, then MTT equivalence is decidable. Unfortunately, we will show that even with one variable , the register automata of -RA have undecidable functionality and equivalence:

Theorem 17.

The functionality problem for -RA and equivalence problem for functional -RA are undecidable.

We prove this undecidability result by reducing the reachability problem for 2-counter machines to the equivalence problem on deterministic -RA with a monadic input (i.e. that reads words rather than trees). This means that the actual theorem we prove is slightly more powerful than Theorem 17. Its full extent is described in Theorem 20. We recall the definition of a 2-counter machine.

Definition 18.

A 2-counter machine (2CM) is a pair , where:

  • is a finite set of states,

  • is a total transition function.

A configuration of is a triplet of one state and two nonnegative integer values (or counters) . We describe how to use transitions between configurations: if there exists in such that for : and . Note that to ensure that no register wrongfully goes into the negative, we assume wlog that if there exists in , then (i.e. we can only decrease a non-zero counter).

The 2CM reachability problem can be expressed as such: starting from an initial configuration , can we access the state , i.e. is there a configuration such that . It is well known that 2CM reachability is undecidable.

Reduction of 2CM Reachability to -RA Equivalence. Let be a -states 2CM. We rename its states . We consider the 2CM reachability problem in from state to state .

We simulate this machine with a -RA . It will have only one state : the configurations of will be encoded in 3 registers of . It will work on a signature , where is of rank 0 and every transition of is a symbol of rank . Intuitively, reading a symbol in models executing this transition in . The automaton will have 6 registers: 3 to encode the configurations of and 3 containing auxiliary polynomials useful to test if the input sequence of transitions describes a valid computation in .

We encode the configurations in 3 registers as follows:

  • register holds the (number of the) current state.

  • registers , hold – the current values of the counters.

We now encode transitions in as register operations in ’. When reading a transition , the update of the configuration is natural. However, we must ensure that we are allowed to use this transition in the current configuration.

To this end, we keep in ’ a witness register. Its value will be 0 if and only if the sequence of transitions read as an input does not constitute a valid path in . To update such a register, when a transition is read, we need to check that and that .

For the state , we design , a polynomial such that and for every other value , : . This approach cannot work for counters, as there is no absolute bound to their value. To remedy that problem, we will design for each a polynomial such that and for every other value, , . . Intuitively, works as a test for counters in the -th step of , since counters cannot exceed the value at that point. This means that will have to be stored and updated in a register of its own. To this end, we introduce the last three registers of :

  • the register . After steps, .

  • the register . After steps, .

  • the witness register . After steps, we read a valid path in .

We describe how to update the registers of when reading an input symbol . Note that according to our definition of -RA, the new values are computed as a function of the old value of . This means that any value on the right of the assignation symbol is the value before reading .

  • , , ,

  • , ,

  • , where:

    • ,

    • for ,

This update strategy ensures that each counter does what we established its role to be. The only register for which this is not trivial is . We show that if and only if we failed to read a proper path in .

We proceed by induction on the number of steps. The induction hypothesis is that a mistake happened before the -th step if and only if before reading the -th symbol. If such is the case, will stay at zero for every subsequent step, as the new value of is always a multiple of the previous ones. If the error occurs exactly at the -th step, it means that the -th letter of the input was a transition , but was not (and hence ), or that for this transition to apply we need the counter to be when it was not (or conversely assumed it when it was ). This last case is caught by . By using , we have exactly when we were wrong. If then we assume . We know that , where is the number of step taken. By using , we have that exactly when .

The final step of the reduction comes by picking the output function for the only state of . We pick . The only way for the output to not be 0 is if ends in (i.e. we reached state ) and if (i.e. we used a valid path). In other words, the following Lemma holds.

Lemma 19.

is the constant 0 function if and only if n is not reachable from 1 in

By comparing to a -RA performing the constant 0 function, we get that deciding equivalence on functional -RA would allow to decide 2CM Reachability. Similarly, running nondeterministically and , we get that deciding functionality on -RA would allow to decide 2CM Reachability. This leads to the proof of Theorem 17. More than that, we show a more thorough result:

Theorem 20.

The equivalence problem for deterministic -RA is undecidable, even with a monadic input alphabet. The functionality problem for -RA is undecidable, even with a one-letter monadic alphabet.

The first point is given directly by the point above: is a deterministic -RA on a monadic alphabet , that is to say, the input of is a word where each letter is an element of .

For the second point, we imagine a slight alteration of this proof where the input alphabet is where is of rank 0 and of rank 1, that is to say, the input of the -RA would be a word of form . In this new version, is no longer deterministic, but guesses each time what transition of to emulate. When reads , it either guesses a correct path of length , or makes a mistake and returns 0. is functional iff it cannot guess a run that produces something else than 0, i.e. iff n is not reachable from 1 in .

5 Conclusion

We use “Hilbert Methods” to study equivalence problems on register automata. To apply these methods to register automata on contexts, we consider algebras with a substitution operation. To show the decidability of equivalence on UCF-RA, a class that subsumes MSO-definable transformations in unordered forests, we use the fact that bounded degree substitution can be encoded into in . However, when applying the same method to Macro Tree Transducers, we are led to consider register automata on , whose equivalence we prove to be undecidable. In essence, for the “Hilbert Methods” we consider to provide positive results, it seems necessary to limit the use of composition.

Future developments of this work could then consist of finding other acceptable restrictions on the use of composition in that still allows for decidability results in register automata. Another possible avenue is to use the properties of to prove negative results: if , and register automata have undecidable problems in , then this negative results propagates to . Finally, “Hilbert Methods” can apply to a huge variety of algebras (e.g. UCF in this paper or in [5]). They provide decidability results on register automata on algebras with nontrivial structure properties like commutativity of operations (e.g. children in UCF) that make the usual methods to decide equivalence difficult to apply.

Acknowledgments We would like to thank Mikołaj Bojańczyk, from the University of Warsaw, for introducing us to the topic of using the Hilbert Method to decide equivalence of register automata, and for his active participation in the finding of this result and the writing of this paper.
This work was supported by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (ERC consolidator grant LIPA, agreement no. 683080) and partially supported by the NCN grant 2016/21/B/ST6/01505.

References

  • [1] Michael H. Albert and J. Lawrence. A proof of ehrenfeucht’s conjecture. Theor. Comput. Sci., 41:121–123, 1985.
  • [2] Rajeev Alur and Pavol Cerný. Expressiveness of streaming string transducers. In FSTTCS, volume 8 of LIPIcs, pages 1–12. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2010.
  • [3] Rajeev Alur and Loris D’Antoni. Streaming tree transducers. J. ACM, 64(5):31:1–31:55, 2017. URL: http://doi.acm.org/10.1145/3092842, doi:10.1145/3092842.
  • [4] Rajeev Alur, Loris D’Antoni, Jyotirmoy V. Deshmukh, Mukund Raghothaman, and Yifei Yuan. Regular functions and cost register automata. In LICS, pages 13–22. IEEE Computer Society, 2013.
  • [5] Michael Benedikt, Timothy Duff, Aditya Sharad, and James Worrell. Polynomial automata: Zeroness and applications. In LICS, pages 1–12. IEEE Computer Society, 2017.
  • [6] Mikołaj Bojańczyk. Automata toolbox. May 2018. URL: https://www.mimuw.edu.pl/~bojan/paper/automata-toolbox-book.
  • [7] Mikołaj Bojańczyk and Igor Walukiewicz. Forest algebras. In Logic and Automata, volume 2 of Texts in Logic and Games, pages 107–132. Amsterdam University Press, 2008.
  • [8] Christian Choffrut. Minimizing subsequential transducers: a survey. Theor. Comput. Sci., 292(1):131–143, 2003.
  • [9] Bruno Courcelle. Monadic second-order definable graph transductions: A survey. Theor. Comput. Sci., 126(1):53–75, 1994.
  • [10] Joost Engelfriet and Sebastian Maneth. Macro tree translations of linear size increase are MSO definable. SIAM J. Comput., 32(4):950–1006, 2003.
  • [11] Joost Engelfriet and Heiko Vogler. Macro tree transducers. J. Comput. Syst. Sci., 31(1):71–146, 1985.
  • [12] Juha Honkala. A short solution for the HDT0L sequence equivalence problem. Theor. Comput. Sci., 244(1-2):267–270, 2000.
  • [13] Karel Culik II and Juhani Karhumäki. The equivalence of finite valued transducers (on HDT0L languages) is decidable. Theor. Comput. Sci., 47(3):71–84, 1986.
  • [14] Sebastian Maneth. A survey on decidable equivalence problems for tree transducers. Int. J. Found. Comput. Sci., 26(8):1069–1100, 2015. URL: https://doi.org/10.1142/S0129054115400134, doi:10.1142/S0129054115400134.
  • [15] William C. Rounds. Mappings and grammars on trees. Mathematical Systems Theory, 4(3):257–287, 1970.
  • [16] Helmut Seidl, Sebastian Maneth, and Gregor Kemper. Equivalence of deterministic top-down tree-to-string transducers is decidable. In FOCS, pages 943–962. IEEE Computer Society, 2015.

Appendix

Bottom-Up Streaming Tree Transducers. We go more into detail into Streaming Tree Transducers (STT), that read and output nested words. This formalism is central to the proof of Proposition 15, and of Proposition 16.

In intuition, an STT works with a configuration composed of a state, a finite number of typed variables (or registers) that contains nested words with at most one occurrence of a context symbol (this corresponds to the Ordered Forest Algebra (OCF) in the sense of [7]), and a stack containing pairs of stack symbols and variable valuations. The nesting of the words dictates how this stack behaves: each opening letter stores the current variable values in the stack to start with fresh ones, then each closing letter uses the current variable values and the top of the stack to generate new values for the registers. The operations on nested words that can be performed in such cases correspond to polynomial operations on OCF: one can use concatenation, context application (which translates directly into OCF), or use a constant nested word, that can be simulated by the roots and concatenation: can be seen in forests as .

The general definitions are available on [3]. We use specifically bottom-up STT, where reading an opening symbol resets the state as well as the registers. On such STT, the behavior of a STT reading the nested word of a subtree does not depend on what occurs before or after. The original paper also imposes a single-use restriction, to ensure each operation can use each register only once. We can keep this restriction, but will not need it. We add a few restrictions to this model:

  • We do not allow letters beyond nesting letters. In the language of [3] this means we ignore internal transitions.

  • The input domain is a language of nested words of binary trees.

We will call this subclass Bottom-Up BT STT. The first result we need comes directly from the results of [3] that states that single-use STT (even limited to bottom-up) describe exactly MSO functions on nested words:

Proposition 21.

Every function of is described by a Bottom-Up BT STT.

From STT to UCF-RA. To complete the proof, we show that if a Bottom-Up BT STT describes , then we can find a UCF-RA that describes the function that represents, by forgetting the order in the output.

Proposition 22.

Every function of a Bottom-Up BT STT is described by an OCF-RA.

Proof.

We propose in a figure below the run of a bottom-up STT in a tree . The subtrees and are of root and . The second line corresponds to its configuration (state , register values ), while the third line keeps track of the top symbol of the STT’s stack. The state and register valuation are respectively the initial state and register values. The symbol that was at the top of the stack when reaching is denoted as .