Syntactic Complexity of Finite/Cofinite, Definite, and Reverse Definite Languages
We study the syntactic complexity of finite/cofinite, definite and reverse definite languages. The syntactic complexity of a class of languages is defined as the maximal size of syntactic semigroups of languages from the class, taken as a function of the state complexity of the languages. We prove that is a tight upper bound for finite/cofinite languages and that it can be reached only if the alphabet size is greater than or equal to . We prove that the bound is also for reverse definite languages, but the alphabet size is . We show that is a lower bound on the syntactic complexity of definite languages, and conjecture that this is also an upper bound, and that the alphabet size required to meet this bound is . We prove the conjecture for .
Keywords: definite, finite automaton, finite/cofinite, regular language, reverse definite, syntactic complexity, syntactic semigroup
A language is definite if it can be decided whether a word belongs to it simply by examining the suffix of of some fixed length. The class of definite languages was the very first subclass of regular languages to be considered: it was introduced in 1954 in the classic paper by Kleene . It was then studied in 1963 by Perles, Rabin, and Shamir , and Brzozowski , in 1966 by Ginzburg , and later by several others. Definite languages were revisited in 2009 by Bordihn, Holzer and Kutrib  in connection with state complexity. Reverse definite languages were first studied by Brzozowski . Here membership of can be determined by its prefix of some fixed length. The class of finite and cofinite languages is the intersection of the definite and reverse definite classes. Here testing for membership can be done by checking all words shorter than some fixed length. These three classes appear at the bottom of the dot-depth hierarchy  of star-free languages, below generalized definite languages and locally testable languages. All three classes are boolean algebras. The semigroup of a finite/cofinite language is nilpotent: It has a single idempotent which is a zero, and is characterized by the equations . For definite (reverse definite) languages every idempotent is a right zero, that is, (respectively, a left zero, that is, ).
We study the sizes of syntactic semigroups of finite/cofinite, definite, and reverse definite languages. If is a regular language over alphabet , its syntactic semigroup is defined by the Myhill congruence  : For ,
The set of equivalence classes of the relation is the syntactic semigroup of . It is well-known that this semigroup is isomorphic to the semigroup of transformations performed by non-empty words in the minimal deterministic finite automaton (DFA) recognizing , and it is usually convenient to deal with the latter semigroup. It is obvious that the transformation semigroup of the minimal DFA of is identical to that of the minimal DFA of , the complement of .
The syntactic complexity of a language is the size of its syntactic semigroup, and , where denotes the cardinality of a set . Syntactic complexity can vary significantly among languages with the same state complexity , where the state complexity of a language is the number of states in its minimal DFA.
The observation that is a tight upper bound on the size of the transformation semigroup of a DFA with states was first made by Maslov  in 1970, although this follows immediately from a 1935 result of Piccard , who showed that three generators suffice to produce all transformations of a set of elements. The interest in syntactic complexity of subclasses of regular languages is new. In 2003–2004 Holzer and König , and Krawetz, Lawrence and Shallit  studied unary and binary languages. In 2011 Brzozowski and Ye  showed the following bounds: right ideals—tight upper bound ; left ideals—lower bound ; two-sided ideals—lower bound . In 2012 Brzozowski, Li and Ye  found the following bounds: prefix-free languages—tight upper bound ; suffix-free languages—lower bound ; bifix-free languages—lower bound ; factor-free languages—lower bound . Also in 2012 tight upper bounds were found for three subclasses of star-free languages by Brzozowski and Li : monotonic languages—; partially monotonic languages—; nearly monotonic languages—, where is the binomial coefficient choose . It was conjectured in  that the bound for nearly monotonic languages is also a tight upper bound for star-free languages. That bound is asymptotically .
We prove that is a tight upper bound for finite/cofinite languages, and that a growing alphabet of size at least is required to reach the bound. For reverse definite languages the bound is also , but the alphabet size is now . We show that is a lower bound for definite languages, and that it can be reached with an alphabet of size . We conjecture that this is also an upper bound, and prove the conjecture for .
There is a lack of left-right symmetry in several results for syntactic complexity in spite of the fact that the syntactic congruence is symmetric. Thus, in the case of ideals , it was easy to find a tight upper bound for right ideals, but no tight upper bound is known for left ideals. It was easy to find a tight upper bound for prefix-free languages, but no tight upper bound is known for suffix-free languages . This happens again here. We have a tight upper bound for reverse definite languages, but no tight upper bound for definite languages.
A transformation of a set is a mapping of into itself. We consider only transformations of finite sets, and assume without loss of generality that . If is a transformation of , and , then is the image of under . An arbitrary transformation can be written in the form
where , , and . We also use the notation for the transformation above.
If is a subset of , then , and the restriction of to , denoted by , is a mapping from to such that for all .
A permutation of is a mapping of onto itself. A transformation is permutational if there exists some with such that is a permutation of . Otherwise, is non-permutational.
A constant transformation, denoted by , has for all .
The composition of two transformations and of is a transformation such that for all . We usually omit the composition operator.
A deterministic finite automaton (DFA) is a quintuple , where is a finite, non-empty set of states, is a finite non-empty alphabet, is the transition function, is the initial state, and is the set of final states. We extend to in the usual way. The DFA accepts a word if . The set of all words accepted by is the language of . Two states of a DFA are distinguishable if there exists a word which is accepted from one of the states and rejected from the other. Otherwise, the two states are equivalent. A DFA is minimal if all of its states are reachable from the initial state and no two states are equivalent. All the minimal DFA’s of a given language are isomorphic.
The notion of a DFA connects transformations to regular languages. Given a regular language , its minimal DFA , and a word , the transition function is a transformation of , the transformation caused by . When convenient, we identify a word with its corresponding transformation.
The (left) quotient of a language by a word is the language . Note that , where is the empty word. The quotient DFA of a regular language is , where , , , and . The quotient DFA is isomorphic to the minimal DFA accepting .
3 Finite/Cofinite Languages
One of the simplest classes of regular languages is the class of finite and cofinite languages, where a language is cofinite if its complement is finite. Since the syntactic complexity bounds for finite and cofinite languages are identical, we restrict our analysis here to finite languages.
Let be a regular language and be its minimal DFA. It is well-known that is finite/cofinite if and only if there exists a numbering on so that for all , implies that or . We define the set of transformations on with these properties:
It is clear that is a semigroup under composition of size .
Let be a finite or cofinite language with state complexity . Then the syntactic complexity of satisfies and this bound is tight.
Let be the minimal DFA of . The above discussion implies that we may label the states so that is a subsemigroup of . Therefore the bound holds.
Let and . Let be a with states numbered , initial state , sink state , and a final state . For each transformation , assign a letter in whose input transformation on is exactly . To show that is minimal, note that state is reached from the initial state by the transformation . Also, if and are two states and , then the transformation that has , and for all other , distinguishes the two states. Hence is minimal and accepts a finite language. Therefore the bound is tight. ∎
A natural question is the minimal size of the alphabet required to achieve the upper bound. Let be the minimal DFA of a finite language with . For any state and , it is clear that or . It follows that if an input transformation satisfies for some , then any word corresponding to must have length 1, that is, must be in .
Let be a finite or cofinite language with state complexity , and suppose that . Then
and this bound is tight.
By Theorem 3.1, we may assume that The preceding discussion implies that is at least the number of transformations which satisfy for some . Let be the set of these transformations. If we place the restriction for all then there are choices for these , and hence a total of such transformations. Therefore Now let be arbitrary. Let
and Then and . Thus generates , and the bound is tight. ∎
For , the largest semigroup is
and its minimal generating set is shown in boldface.
4 Reverse Definite Languages
A reverse definite language is a language of the form , where and are finite languages. Because reverse definite languages are characterized by prefixes of a fixed length, their minimal DFAs (and hence syntactic complexity bounds) are very similar to those of finite/cofinite languages. If has state complexity 1, then either or . Since both these languages are in the finite/cofinite class, the bound of Theorem 3.1 applies. For state complexities , we note first that if is not a quotient of , then is cofinite. Otherwise, and are both quotients of . Let be the minimal DFA of , and label the states corresponding to and with and , respectively. One can number the other states in so that for all words , if then with equality if and only if .
The syntactic complexity results for reverse definite languages now follow directly from the finite/cofinite results.
Let be a reverse definite language with state complexity . Then and this bound is tight. Moreover, if this language achieves this upper bound and , then , and this bound is tight.
First, if is not a quotient of , then is cofinite and hence has the same bounds as in the previous section. To find a cofinite witness meeting the bound , first find a finite witness as in the proofs of Theorems 3.1 and 3.2, and then interchange its final and non-final states.
Otherwise, let be the minimal DFA recognizing , and let the states be totally ordered as in the preceding discussion. Define the set of transformations analogous to the finite case:
Then , which a straightfoward calculation shows to be a semigroup. Clearly, , thus proving the bound.
For the minimal size of the alphabet, we define to be the set of transformations in satisfying for some . As in Section 3, these transformations must correspond to individual letters in , hence proving the bound. The same indirect counting argument shows that for , A similar argument also shows that generates (using the transformations and in place of ). Therefore the alphabet size bound is tight. ∎
For , the finite witness meeting the bound has the transformation set given in Example 1. We modify this set by making the sink state, thus obtaining
where the generators are in boldface, and state 4 is final.
5 Definite Languages
A definite language is a language of the form , where and are finite languages. Like finite/cofinite and reverse definite languages, definite languages are characterized by their transformation semigroups. In this case, every transformation of the minimal DFA of a regular language must be non-permutational. Conversely, if the transformation semigroup of a minimal DFA contains only non-permutational transformations, then it accepts a definite language.
Our goal for this section is to find the maximal size of a non-permutational transformation semigroup, that is, one which contains only non-permutational transformations. There is a straightforward bijection between such transformations on and simple labeled forests on nodes. This can be seen by constructing the graph on nodes with edges representing , and then removing the unique node for which . Then Cayley’s Theorem [7, 17] shows that there are non-permutational transformations of .
Identifying non-permutational transformations is not sufficient to find a syntactic complexity bound, as the set of such transformations does not form a semigroup for . For example, the composition of and is , which is permutational. Two transformations conflict if there exists a permutational transformation in the semigroup that they generate.
We exhibit the following sets of non-permutational transformations which do not conflict; they are similar to the semigroup from Section 3.
Let , and define the following sets of transformations:
Then the set of transformations is a maximal non-permutational semigroup of size .
One can check that each is a semigroup. Let and , with . A direct computation shows that , and ; hence is a semigroup. Moreover, for all , , and so all of the transformations are non-permutational.
A simple counting argument shows that
Since the are disjoint,
For the maximality of , we show that adding any other non-permutational transformation creates a conflict. Let be non-permutational, with .
First suppose that there exists a with . Since is non-permutational, we may assume . Then there exists a with ; then and , and so and conflict.
If no such exists, then there must exist a with . Consider the sequence defined by , . If there exists an such that , let be the minimal one. Let with and . Then , , and so is permutational. Now suppose all . Since is non-permutational, must appear in the sequence; moreover, since , we can pick so that . Since , we may find a transformation with and . Then , , and is permutational. ∎
To compute the generators of , we require the following definition. Let be the set of all transformations with all . Define the function by , and also
Clearly, is a bijection between and .
Let . Then
is the minimum set of generators for .
For (1), note that . For any , we can write with and , as in the proof of Theorem 3.2. Therefore generates .
Now let and , with . We consider , and use the fact that each transformation satisfies . There are two cases:
If , then , hence .
If , then , hence .
It follows that ; a similar argument shows that . Consequently, no transformation in is a composition of two others in , and so is the minimum generating set of .
For (2), we calculate , or equivalently because is a bijection. A counting argument shows that . Therefore
The following corollary establishes a direct connection with definite languages.
For all , there exists a definite language with state complexity , syntactic complexity and alphabet size
Let be a DFA with , , , and with each letter representing a different transformation in , so that the transformation semigroup of is . We claim that this is a minimal DFA of a definite language. First, all the states are reachable by the constant transformations . Also, any two states with are distinguishable by the transformation which acts as for , and for . State is distinguishable from every other state because it is the only final state. Hence is minimal. Then by Theorem LABEL:thm:defEquiv, accepts a definite language. ∎
Let be a definite language with state complexity . Then and if equality holds then
For we have the following transformations in :
The generators are shown in boldface.
6 Conclusions and Future Work
Though we have found tight upper bounds on the syntactic complexity of finite/cofinite and reverse definite languages, we have only conjectured the bounds on the syntactic complexity and the corresponding alphabet size for definite languages. The conjecture has been verified through computational enumeration for , but remains unproven for . Also, syntactic complexity bounds have yet to be found for the related higher classes in the dot-depth hierarchy of star-free languages, namely the generalized definite and locally testable languages. It is possible that the technique used in this paper—characterize allowable transformations in the syntactic semigroup and apply combinatorial arguments to count them—can be used to find bounds for these languages as well.
- thanks: This work was supported by the Natural Sciences and Engineering Research Council of Canada under grant No. OGP0000871
- Bordihn, H., Holzer, M., Kutrib, M.: Determination of finite automata accepting subregular languages. Theoret. Comput. Sci. 410 (2009) 3209–3249
- Brzozowski, J.: Canonical regular expressions and minimal state graphs for definite events. In: Proceedings of the Symposium on Mathematical Theory of Automata. Volume 12 of MRI Symposia Series, Polytechnic Press, Polytechnic Institute of Brooklyn, N.Y. (1963) 529–561
- Brzozowski, J., Li, B.: Syntactic complexities of some classes of star-free languages. In Kutrib, M., Moreira, N., Reis, R., eds.: Proceedings of the 14th International Workshop on Descriptional Complexity of Formal Systems DCFS. Volume 7386 of LNCS, Springer (2012) 117–129
- Brzozowski, J., Li, B., Ye, Y.: Syntactic complexity of prefix-, suffix-, bifix, and factor-free languages. Theoret. Comput. Sci. (2012) In press.
- Brzozowski, J., Simon, I.: Characterizations of locally testable events. Discrete Math. 4(3) (1973) 243–271
- Brzozowski, J., Ye, Y.: Syntactic complexity of ideal and closed languages. In Mauri, G., Leporati, A., eds.: 15th International Conference on Developments in Language Theory, DLT 2011. Volume 6795 of LNCS, Springer (2011) 117–128
- Cayley, A.: A theorem on trees. Quart. J. Math. 23 (1889) 376–378
- Ginzburg, A.: Abour some properties of definite, reverse definite and related automata. IEEE Trans. Electronic Comput. EC–15 (1966) 806–810
- Holzer, M., König, B.: On deterministic finite automata and syntactic monoid size. Theoret. Comput. Sci. 327(3) (2004) 319 – 347
- Kleene, S.C.: Representation of events in nerve nets and finite automata. In Shannon, C.E., McCarthy, J., eds.: Automata Studies. Princeton University Press, Princeton, NJ (1954) 3–41
- Krawetz, B., Lawrence, J., Shallit, J.: State complexity and the monoid of transformations of a finite set (2003) http://arxiv.org/abs/math/0306416v1.
- Maslov, A.N.: Estimates of the number of states of finite automata. Dokl. Akad. Nauk SSSR 194 (1970) 1266–1268 (Russian). English translation: Soviet Math. Dokl. 11 (1970), 1373–1375.
- McNaughton, R., Papert, S.A.: Counter-Free Automata. Volume 65 of M.I.T. Research Monographs. The MIT Press (1971)
- Myhill, J.: Finite automata and representation of events. Wright Air Development Center Technical Report 57–624 (1957)
- Perles, M., Rabin, M.O., Shamir, E.: The theory of definite automata. IEEE Trans. Electronic Comput. EC–12 (1963) 233–243
- Piccard, S.: Sur les fonctions définies dans les ensembles finis quelconques. Fund. Math. 24 (1935) 298–301
- Shor, P.W.: A new proof of Cayley’s formula for counting labeled trees. Journal of Combinatorial Theory, Series A 71(1) (1995) 154–158