Large Aperiodic Semigroups††thanks: This work was supported by the Natural Sciences and Engineering Research Council of Canada grant No. OGP000087 and by Polish NCN grant DEC-2013/09/N/ST6/01194.
The syntactic complexity of a regular language is the size of its syntactic semigroup. This semigroup is isomorphic to the transition semigroup of a minimal deterministic finite automaton accepting the language, that is, to the semigroup generated by transformations induced by non-empty words on the set of states of the automaton. In this paper we search for the largest syntactic semigroup of a star-free language having left quotients; equivalently, we look for the largest transition semigroup of an aperiodic finite automaton with states.
We introduce two new aperiodic transition semigroups. The first is generated by transformations that change only one state; we call such transformations and resulting semigroups unitary. In particular, we study complete unitary semigroups which have a special structure, and we show that each maximal unitary semigroup is complete. For there exists a complete unitary semigroup that is larger than any aperiodic semigroup known to date.
We then present even larger aperiodic semigroups, generated by transformations that map a non-empty subset of states to a single state; we call such transformations and semigroups semiconstant. In particular, we examine semiconstant tree semigroups which have a structure based on full binary trees. The semiconstant tree semigroups are at present the best candidates for largest aperiodic semigroups.
We also prove that is an upper bound on the state complexity of reversal of star-free languages, and resolve an open problem about a special case of state complexity of concatenation of star-free languages.
Keywords: aperiodic, monotonic, nearly monotonic, partially monotonic, semiconstant, transition semigroup, star-free language, syntactic complexity, unitary
The state complexity of a regular language is the number of states in a complete minimal deterministic finite automaton (DFA) accepting the language . An equivalent notion is that of quotient complexity, which is the number of left quotients of the language ; we prefer quotient complexity since it is a language-theoretic notion. The usual measure of complexity of an operation on regular languages [1, 17] is the quotient complexity of the result of the operation as a function of the quotient complexities of the operands. This measure has some serious disadvantages, however. For example, as shown in , in the class of star-free languages all common operations have the same quotient complexity as they do in the class of arbitrary regular languages111Two small exceptions are discussed in Section 6.. Thus quotient complexity fails to differentiate between the very special class of star-free languages and the class of all regular languages.
It has been suggested that other measures of complexity may also be useful , in particular, the syntactic complexity of a regular language which is the cardinality of its syntactic semigroup . This is the same as the cardinality of the transition semigroup of a minimal DFA accepting the language, and it is this latter representation that we use here. The transition semigroup is the set of all transformations induced by non-empty words on the set of states of the DFA. The syntactic complexity of a class of languages is the size of the largest syntactic semigroups of languages in that class as a function of the quotient complexities of the languages. Since the syntactic complexity of star-free languages is considerably smaller than that of regular languages, this measure succeeds in distinguishing the two classes.
The class of star-free languages is the smallest class obtained from finite languages using only boolean operations and concatenation, but no star. By Schützenberger’s theorem  we know that a language is star-free if and only if the transition semigroup of its minimal DFA is aperiodic, meaning that it contains no non-trivial subgroups. Equivalently, a transition semigroup is aperiodic if and only if no word over the alphabet of the DFA can induce a non-trivial permutation of any subset of two or more states. Star-free languages and the DFAs that accept them were studied by McNaughton and Papert in 1971 .
Two aperiodic semigroups, monotonic and partially monotonic, were studied by Gomes and Howie . Their results were adapted to finite automata in , where nearly monotonic semigroups were also introduced; they are larger than the partially monotonic ones and were the largest aperiodic semigroups known to date for . For the largest aperiodic semigroups known to date were those generated by DFAs accepting -trivial languages . The syntactic complexity of -trivial languages is . As to aperiodic semigroups, tight upper bounds on their size were known only for .
The following are the main contributions of this paper:
Using the method of , we have enumerated all aperiodic semigroups for , and we have shown that the maximal aperiodic semigroup has size 47, while the maximal nearly monotonic semigroup has size 41. Although this may seem like an insignificant result, it provided us with strong motivation to search for larger semigroups.
The number of aperiodic transformations is . For large the number of aperiodic semigroups is very large, and so it is difficult to check them all.
We studied semigroups generated by transformations that change only one state; we call such transformations and semigroups unitary. We characterized unitary semigroups and computed their maximal sizes up to . There are unitary transformations. For the maximal unitary semigroups are larger than the maximal nearly monotonic ones and also larger than any previously known aperiodic semigroup.
For each we found a set of DFAs whose inputs induce semiconstant tree transformations – transformations that send a non-empty subset of the set of all states to a single state, and have a structure based on full binary trees. For , there is a semiconstant tree semigroup larger than the largest complete unitary semigroup. We computed the maximal size of these transition semigroups up to . The total number of semiconstant transformations is .
We derived formulas for the sizes of complete unitary and semiconstant tree semigroups. We also provided recursive formulas characterizing the maximal complete unitary and semiconstant tree semigroups; these formulas lead to efficient algorithms for computing the forms and sizes of such semigroups.
We proved that the quotient complexity of the reverse of a star-free language with quotient complexity is at most .
We resolved an open problem about a special case of quotient complexity of product (catenation, concatenation) of star-free languages and , when the quotient complexities of and are and , respectively: we proved that is a tight upper bound.
Our results about aperiodic semigroups are summarized in Tables 1 and 2 for small values of . Transformation is the identity; it can be added to unitary and semiconstant transformations without affecting aperiodicity.
Additional information about the classes of semigroups in Tables 1 and 2 will be given later. The classes are listed in the order of increasing size when is large. The number in boldface shows the value of for which the size of a given semigroup exceeds the sizes of all of the preceding ones. For example, the largest semigroups of finite languages exceed the preceding semigroups for .
There are two more classes of syntactic semigroups that have the same complexity as the semigroups of finite languages: those of cofinite and reverse definite languages. The tight upper bound for -trivial languages () is also a lower bound for definite languages (). An upper bound of has been shown to hold  for definite and generalized definite languages , but it is not known whether this bound is tight.
The asymptotic behaviour of the size of partially monotonic semigroups is , where and are constants . For nearly monotonic semigroups the size is .
The remainder of the paper is structured as follows. Section 2 presents our terminology and notation. Our large aperiodic semigroups are defined in Section 3. The special case of unitary semigroups is then considered in Section 4, and semiconstant tree semigroups are the topic of Section 5. Section 6 contains the new results about reversal and product. Section 7 concludes the paper.
2 Terminology and Notation
Let be a finite alphabet. The elements of are letters and the elements of are words, where is the free monoid generated by . The empty word is denoted by , and the set of all non-empty words is , the free semigroup generated by . A language is any subset of .
Suppose . Without loss of generality we assume that our basic set under consideration is . A deterministic finite automaton (DFA) is a quintuple , where is a finite non-empty set of states, is a finite non-empty alphabet, is the transition function, is the initial state, and is the set of final states. We extend to and to in the usual way. A DFA accepts a word if . The language accepted by is .
By the language of a state of we mean the language accepted by the DFA . A state is empty (also called dead or a sink) if its language is empty. Two states and of are equivalent if . Otherwise, states and are distinguishable. A state is reachable if there exists a word such that . A DFA is minimal if all its states are reachable and pairwise distinguishable.
A transformation of is a mapping of into itself. Let be a transformation of ; then is the image of under . If is a subset of , then . An arbitrary transformation can be written in the form
where for . We also use as a simplified notation. The composition of two transformations and of is a transformation such that for all . We usually drop the composition operator “” and write .
Let be the set of all transformations of ; then is a monoid under composition. The identity transformation 1 maps each element to itself, that is, for all . For , a transformation (permutation) of a set is a -cycle if . A -cycle is denoted by . If a transformation of acts like a -cycle on some , we say that has a -cycle. A transformation has a cycle if it has a -cycle for some . For , a transposition is the 2-cycle . A permutation of is a mapping of onto itself. A transformation is aperiodic if it contains no cycles.
In any DFA , each word induces a transformation of defined by for all . The set of all transformations of induced in by non-empty words is the transition semigroup of . This semigroup is a subsemigroup of . If is minimal, its transition semigroup is isomorphic to the syntactic semigroup of the language [14, 15]. A language is regular if and only if its syntactic semigroup is finite. The size of the syntactic semigroup of a language is called its syntactic complexity. In this paper we deal only with transition semigroups; consequently, we view syntactic complexity as the size of the transition semigroup.
If is a set of transformations, then is the semigroup generated by . If is a DFA, the transformations induced by letters of are called generators of the transition semigroup of or simply generators of .
3 Unitary and Semiconstant DFAs
We now define a new class of aperiodic DFAs among which are found the largest transition semigroups known to date. We also study several of its subclasses.
A unitary transformation , denoted by , has , and for all . A DFA is unitary if each of its generators is unitary. A semigroup is unitary if it has a set of unitary generators.
A constant transformation , denoted by , has for all . A transformation is semiconstant if it maps a non-empty subset of to a single element and leaves the remaining elements of unchanged. It is denoted by . A constant transformation is semiconstant with , and a unitary transformation is semiconstant with (or ). A DFA is semiconstant if each of its generators is semiconstant. A semigroup is semiconstant if it has a set of semiconstant generators.
For each we shall define several DFAs. Let , be positive natural numbers. Also, let , and for each , , define by . For , let ; thus the cardinality of is . Let ; the cardinality of is . The sequence is called the distribution of .
The number of different distributions for each is . This is easily verified by induction on . For there is only one distribution, namely ; hence . Suppose that for . For , each distribution is either or it has , where , combined with any distribution of the integer . Hence the number of distributions is For example, for we have the distributions , , , .
A binary tree is full if every vertex has either two children or no children. There are full binary trees, where is the Catalan number222http://en.wikipedia.org/wiki/.
Let be a full binary tree with leaves labeled from left to right. To each node , we assign the union of all the sets labeling the leaves in the subtree rooted at .
With each full binary tree we can associate different distributions. A full binary tree with a distribution attached is denoted by and is called the structure of . This structure will uniquely determine the transition function of the DFAs defined below. The number of possible structures of for a given is the binomial transform of , the Catalan number333http://oeis.org/A007317.
We can denote the structure of as a binary expression. For example, the expression denotes the full binary tree in which the leaves are labeled , , , and , where , and the interior nodes are labeled by , and . On the other hand, the expression has interior nodes labeled , and .
Definition 1 (Transformations)
- Type 1:
Suppose and is a distribution of . For all and Type 1 transformations are the unitary transformations and .
- Type 2:
Suppose and is a distribution of . If and , for each and , is a Type 2 transformation.
- Type 3:
Suppose and is a structure of . For each internal node the semiconstant transformation is of Type 3.
- Type 4:
The identity transformation on is of Type 4.
For a fixed there are Type 1 transformations and Type 2 transformations. The number of Type 3 transformations is .
Note that the distribution affects transformations of Types 1, 2, and 3, whereas the binary tree affects only transformations of Type 3.
In the following DFAs the transition function is defined by a set of transformations and the alphabet consists of letters inducing these transformation.
Definition 2 (DFAs)
If there is no such that , then any DFA of the form , where has all the transformations of Types 1 and 2, is a complete unitary DFA.
is with added.
Any DFA , where has all the transformations of Types 1, 2 and 3, is a semiconstant tree DFA.
is with added.
Using terminology analogous to that of , we define a bipath (bidirectional path) to be a graph , where for some , and for each there are two edges and . If , the graph is also considered a (trivial) bipath. If we ignore self-loops, each edge in the graph uniquely determines a unitary transformation, and the states in each in constitute a bipath. Also, the graph of is a sequence of bipaths, where there are transitions from every in to every in , if .
Figure 1 shows three examples of unitary DFAs. In Fig. 1 (a) we have DFA , where the letter induces the unitary transformation . In Fig. 1 (b) we present , where only the transitions between different states are included to simplify the figure. Also, the letter labels are deleted because they are easily deduced. Next, in Figs. 1 (c) and (d), we have the DFAs and , respectively. We shall return to these examples later.
All four DFAs of Definition 2 are minimal as is easily verified. Hence the syntactic semigroup of the language of each DFA is isomorphic to the transition semigroup of the DFA.
4 Unitary Semigroups
We study unitary semigroups because their generators are the simplest. We begin with three previously studied special semigroups which are subsemigroups of a unitary semigroups.
4.1 Monotonic Semigroups
Monotonic semigroups were previously studied in [6, 10, 11]. A transformation of is monotonic if there exists a total order on such that, for all , implies . Note that the identity transformation is monotonic. A DFA is monotonic if each of its input transformations is monotonic. A semigroup is monotonic if it has a set of monotonic generators. From now on we assume that is the usual order on integers.
The following result of  is somewhat modified for our purposes:
Proposition 1 (Gomes and Howie)
The set of all monotonic transformations other than is an aperiodic semigroup generated by
and no smaller set of unitary transformations generates .
The transition semigroup of is the semigroup of all monotonic transformations.
Note also that there are monotonic semigroups that do not have unitary generating sets; each monotonic semigroup, however, is a subsemigroup of the transition semigroup of consisting of all monotonic transformations.
4.2 Partially Monotonic Semigroups
A partial transformation of is a partial mapping of into itself. If is defined for , then is the image of under ; otherwise, we write . By convention, . The domain of is the set . A partial transformation is monotonic if there exists an order on such that for all , implies .
Semigroups of monotonic partial transformations were studied by Gomes and Howie . They were adapted to automata in . We follow  by starting with all partial transformations of and adding state for the undefined value . We call the resulting transformations partially monotonic. The following is an adaptation of the results of :
For , the DFA has the following properties:
Each of the transformations of is partially monotonic. Thus is partially monotonic, and hence aperiodic.
The transition semigroup of consists of all the partially monotonic transformations of , where
Each generator is idempotent, and is the smallest number of idempotent generators of . Moreover, each generator except is unitary, and is the smallest number of unitary generators of .
There are eight monotonic partial transformations of the set , namely: , , , , , , , . When we replace by state 2, the eight partial transformations become total transformations , , , , , , , . The generators of are: , , , and . The DFA of Figure 1 (c) is an example of .
For the semigroup of all partially monotonic transformations is larger than the semigroup of all monotonic transformations.
Note that there are partially monotonic semigroups that do not have unitary generating sets; each partially monotonic semigroup, however, is a subsemigroup of the transition semigroup of consisting of all partially monotonic transformations.
4.3 Other Previously Studied Aperiodic Semigroups
As we have mentioned in the introduction, the syntactic complexity of five other language classes was studied previously. Cofinite languages are complements of finite languages, and therefore their minimal DFAs have the same transition semigroup as the DFAs of finite languages.
The reverse of a word is spelled backwards and . The reverse of a language is . A language is definite if it has the form , where and are finite. It is reverse definite if its reverse is definite, that is, if it has the form , where and are finite. It was shown in  that the syntactic complexity of reverse definite languages is the same as that of finite languages. A lower bound of was proved for definite languages; it is an open question whether this is also an upper bound.
The well known Green relations define -trivial and -trivial monoids (semigroups with an identity). If is a monoid, the relation is defined by for . A monoid is -trivial if implies . The relation is defined by and is -trivial if implies . Languages whose minimal DFAs have -trivial (-trivial) transition monoids are also called -trivial (-trivial).
Syntactic complexities of -trivial and -trivial languages were studied by Brzozowski and Li . Consider the natural order on . We say that a transformation is non-decreasing if for all . Let be the set of all non-decreasing transformations. The size of is .
It was shown in  that is an -trivial language if and only if its minimal DFA is partially ordered, or equivalently, if its transition semigroup contains only non-decreasing transformations. Thus the largest semigroup generated by DFAs accepting -trivial languages is .
The transition semigroup of is the semigroup of all non-decreasing transformations.
DFA has only unitary transformations of Type 2. They generate only non-decreasing transformations, since each of them preserves the natural order. An arbitrary non-decreasing transformation has the form
where for . Since contains all unitary transformations of the form for , all transformations are present. One verifies that applying results in . Thus each non-decreasing transformation can be generated by at most unitary transformations. ∎
Note that there are semigroups with only non-decreasing transformations that do not have unitary generating sets; each such semigroup, however, is a subsemigroup of . Since every -trivial language is also -trivial, the transition semigroups of all minimal DFAs accepting -trivial languages are also subsemigroups of .
4.4 General Unitary Semigroups
A set of unitary transformations is -cyclic if it has the form , , where the are distinct.
Let be a set of unitary transformations.
If has a -cyclic subset with , then is not aperiodic.
If contains a subset where and , then is not aperiodic.
Without loss of generality, we can replace by in both claims.
Suppose that contains , where , for , and . Then maps and does not affect any other states. Thus the set is cyclically permuted, which shows that is not aperiodic.
If , then the transformation transposes 0 and 1; hence is not aperiodic. ∎
If is unitary, the following are equivalent:
The set of generators of does not contain any -cyclic subsets with , and does not contain any sets of type .
Every strongly connected component of is a bipath.
This follows from Lemma 1.
Consider a strongly connected component . If , the claim holds. Otherwise, suppose and is a transition. Then there must also be a directed path from to . If the last transition in that path is , where , then the set of generators must contain a -cyclic subset with , which is a contradiction. Hence the transition must be present.
Next, suppose that there are transitions , , and . By the argument above there must also be transitions , , and . But then the set of generators contains a subset of type , which is again a contradiction.
It follows that every strongly connected component is a bipath, and the graph of the transitions of is a loop-free connection of such bipaths.
Since a bipath is monotonic, it is aperiodic by Proposition 1. By Schützenberger’s theorem , the language of all words taking any state of the bipath to any other state of that bipath is star-free. Since the graph of is a loop-free connection of bipaths, the language of all words taking any state of to any other state of is star-free. Hence is aperiodic. ∎
A unitary DFA is complete if the addition of any unitary transition results in a DFA that is not aperiodic.
A maximal aperiodic unitary semigroup is isomorphic to the transition semigroup of a complete unitary DFA , where is some distribution of .
We know that an aperiodic unitary DFA is a loop-free connection of bipaths. Let be the bipaths of . There exists a linear ordering of them, such that there is no transformation for . If all possible transformations for are present, then is isomorphic to . Otherwise we can add more unitary transformations of Type 2 and obtain a larger semigroup. ∎
For each distribution , we calculate the size of the transition semigroup of .
The cardinality of the transition semigroup of is
As above is a loop-free connection of bipaths, and its generators are the transformations within each bipath, all transformations of the form where , , , and .
In the transition semigroup of , consider the transformation that (a) does not affect any states in for , (b) maps some number of states of to , and (c) maps the remaining states of to some states in . It is convenient to temporarily consider a partial transformation which for all has the property , if and , otherwise. In other words, the images of the states mapped to the outside of are all lumped together into the undefined value . The number of such partial transformations generated by the transitions in the bipath is ; these are all the partially monotonic transformations of that map exactly states of to .
Returning now to , consider first the case ; then is a total transformation equal to , and there are such transformations. Otherwise, maps states of to arbitrary states in . If is the number of states in the bipaths below , then for each there are transformations . Altogether, for a fixed bipath , the number of transformations is
If is any transformation of , then it can be represented by , where maps into . Since the domains of are disjoint, there is a bijection between transformations and the sets . Hence we can multiply the numbers of different transformations for each , and the formula in the theorem results. ∎
Note that each factor of the product in Theorem 4.3 depends only on and on the sum . Hence if is maximal, then is also maximal and so on. Consequently, we have
Let be the cardinality of the largest transition semigroup of DFA with states. If we define , then for
This leads directly to a dynamic algorithm taking time for computing and the distributions yielding the maximal unitary semigroups. This holds assuming constant time for computing the internal terms in the summation and summing them, where, however, the numbers can be very large (). The precise complexity depends on the algorithms used for multiplication, exponentiation and calculation of binomial coefficients.
We were able to compute the maximal up to . Here is an example of the maximal one for :
its syntactic semigroup size exceeds . Compare this to the previously known largest semigroup of an -trivial language; its size is which is approximately . On the other hand, the maximal possible syntactic semigroup of any regular language for is .
4.5 Asymptotic Lower Bound
We were not able to compute the tight asymptotic bound on the maximal size of unitary semigroups. However, we computed a lower bound which is larger than , the previously known lower bound for the size of aperiodic semigroups.
For even the size of the maximal unitary semigroup is at least
Let be even and consider consisting of bipaths. From Theorem 4.3 we have:
By using the equality we obtain:
For the bound exceeds . Larger lower bounds can also be found using increasing values of in , but the complexity of the calculations increases, and such bounds are not tight.
5 Semiconstant Semigroups
We now consider our largest aperiodic semigroups, the semiconstant ones.
5.1 Nearly Monotonic Semigroups
Let be the set of all constant transformations of , and . We call the transformations in nearly monotonic with respect to the usual order on integers. The next result follows from Proposition 2 and .
Let and . Then
Each of the transformations of of Types 1, 2, and 4 is partially monotonic, and there is one constant transformation . Thus DFA is nearly monotonic, and hence aperiodic.
The transition semigroup of consists of all the nearly monotonic transformations of , where
Each generator, other than the constant and , is unitary, and is the smallest number of unitary, constant and identity generators of .
For the semigroup of all nearly monotonic transformations is larger than the semigroup of all partially monotonic transformations. Note that there are nearly monotonic semigroups that do not have semiconstant generating sets; each nearly monotonic semigroup, however, is a subsemigroup of the transition semigroup of .
5.2 Semiconstant Tree Semigroups
An example of a maximal semiconstant tree DFA for is ; its transition semigroup has 1,849 elements. For , the maximal semiconstant tree semigroup is the largest aperiodic semigroup known.
First we define a new operation on DFAs.
Let and be DFAs. Let . The semiconstant sum of and is the DFA . For each transition in , we have a transition in such that for and otherwise. Dually, we have transitions defined by in . Moreover we have a unitary transformation for each , and a constant transformation .
For , each is a semiconstant sum of two smaller semiconstant tree DFAs , defined by the left subtree of the root of , and , defined by the right subtree.
The semiconstant sum is minimal if and only if every state of is reachable from , the states of are pairwise distinguishable, and is non-empty.
If is minimal, then every state of is reachable from in . Since any transformation mapping a state to a state from is composed from the constant transformation , every state of is reachable from . Now consider two distinct states . Since is minimal, and are distinguishable by some word , and no letter of can induce the constant transformation . Hence every letter of induces a transformation that acts on either as the identity or as some . If we omit the letters that act as the identity, we obtain a word that distinguishes and in .
Conversely, distinct states , are distinguishable as follows. Apply a unitary transformation that takes to a state in . Since is not changed by , and are distinguishable. If and then and are already distinguished (by the empty word). If and then they are distinguishable by assumption. Every state of is reachable from by assumption. Also, any state in is reachable from