Operations on Automata with All States Final

Operations on Automata with All States Final

Abstract

We study the complexity of basic regular operations on languages represented by incomplete deterministic or nondeterministic automata, in which all states are final. Such languages are known to be prefix-closed. We get tight bounds on both incomplete and nondeterministic state complexity of complement, intersection, union, concatenation, star, and reversal on prefix-closed languages.

1 Introduction

A language is prefix-closed if implies that every prefix of is in . It is known that a regular language is prefix-closed if and only if it is accepted by a nondeterministic finite automaton (NFA) with all states final [19]. In the minimal incomplete deterministic finite automaton (DFA) for a prefix-closed language, all the states are final as well.

The authors of [19] examined several questions concerning NFAs with all states final. They proved that the inequivalence problem for NFAs with all states final is PSPACE-complete in the binary case, but polynomially solvable in the unary case. Next, they showed that minimizing a binary NFA with all states final is PSPACE-hard, and that deciding whether a given NFA accepts a language that is not prefix-closed is PSPACE-complete, while the same problem for DFAs can be solved in polynomial time. The NFA-to-DFA conversion and complementation of NFAs with all states final have been also considered in [19], and the tight bound for the first problem, and the lower bound for the second one have been obtained.

The quotient complexity of prefix-closed languages has been studied in [6]. The quotient of a language by the string is the set . The quotient complexity of a language , , is the number of distinct quotients of . Quotient complexity is defined for any language, and it is finite if and only if the language is regular. The quotient automaton of a regular language is the DFA , where , and a quotient is final if it contains the empty string. The quotient automaton of is a minimal complete DFA for , so quotient complexity is the same as the state complexity of which is defined as the number of states in the minimal DFA for . In [6], the tight bounds on the quotient complexity of basic regular operation have been obtained, and to prove upper bounds, the properties of quotients have been used rather than automata constructions.

Automata with all states final represent systems, for example, production lines, and their intersection or parallel composition represents the composition of these systems [22]. A question that arises here is, whether the complexity of intersection of automata with all states final is the same as in the general case of arbitrary DFAs or NFAs. At the first glance, it seems that this complexity could be smaller. Our first result shows that this is not the case. We show that both incomplete and nondeterministic state complexity of intersection on prefix-closed languages is given by the function , which is the same as in the general case of regular languages.

In the deterministic case, to have all the states final, we have to consider incomplete deterministic automata because otherwise, the complete automaton with all states final would accept the language consisting of all the strings over an input alphabet. Notice that the model of incomplete deterministic automata has been considered already by Maslov [21]. The same model has been used in the study of the complexity of the shuffle operation [7]; here, the complexity on complete DFAs is not known yet.

We next study the complexity of complement, union, concatenation, square, star, and reversal on languages represented by incomplete DFAs or NFAs with all states final. We get tight bounds in both nondeterministic and incomplete deterministic cases. In the nondeterministic case, all the bounds are the same as in the general case of regular languages, except for the bound for star that is instead of . However, to prove the tightness of these bounds, we usually use larger alphabets than in the general case of regular languages where all the upper bounds can be met by binary languages [11, 13].

To get lower bounds, we use a fooling-set lower-bound method [2, 3, 4, 9, 12]. In the case of union and reversal, the method does not work since it provides a lower bound on the size of NFAs with multiple initial states. Since the nondeterministic state complexity of a regular language is defined using a model of NFAs with a single initial state [11], we have to use a modified fooling-set technique to get the tight bounds and for union and reversal, respectively.

In the case of incomplete deterministic finite automata, the tight bounds for complement, union, concatenation, star, and reversal are , , and , respectively. To define worst-case examples, we use a binary alphabet for union, star, and reversal, and a ternary alphabet for concatenation.

The paper is organized as follows. In the next section, we give some basic definitions and preliminary results. In Sections 3 and 4, we study boolean operations. Concatenation is discussed in Section 5, and star and reversal in Section 6. The last section contains some concluding remarks.

2 Preliminaries

In this section, we recall some basic definitions and preliminary results. For details and all unexplained notions, the reader may refer to [25].

A nondeterministic finite automaton (NFA) is a quintuple , where is a finite set of states, is a finite alphabet, is the transition function which is extended to the domain in the natural way, is the set of initial states, and is the set of final states. The language accepted by is the set .

The nondeterministic state complexity of a regular language , , is the smallest number of states in any NFA with a single initial state recognizing .

An NFA is incomplete deterministic (DFA) if and for each in and each in . In such a case, we write instead of . A non-final state of a DFA is called a dead state if for each symbol in .

The incomplete state complexity of a regular language , , is the smallest number of states in any incomplete DFA recognizing . An incomplete DFA is minimal (with respect to the number of states) if it does not have any dead state, all its states are reachable, and no two distinct states are equivalent.

Every NFA can be converted to an equivalent DFA , where and . The DFA is called the subset automaton of the NFA . The subset automaton need not be minimal since some of its states may be unreachable or equivalent. However, if for each state of an NFA , there exists a string that is accepted by only from the state , then the subset automaton of the NFA does not have equivalent states since if two subsets of the subset automaton differ in a state , then they are distinguishable by .

To prove the minimality of NFAs, we use a fooling set lower-bound technique, see [2, 3, 4, 9, 12].

  • A set of pairs of strings is called a fooling set for a language if for all in , the following two conditions hold:
     missing(F1) , and
    (F2) if , then or .

It is well known that the size of a fooling set for a regular language provides a lower bound on the number of states in any NFA (with multiple initial states) for the language. The argument is simple. Fix the accepting computations of any NFA on strings and . Then, the states on these computations reached after reading and must be distinct, otherwise the NFA accepts both and for two distinct pairs. Hence we get the following observation.

Lemma 1 ([4, 9, 12]).

Let be a fooling set for a language . Then every NFA (with multiple initial states) for the language has at least states. ∎

The next lemma shows that sometimes, if we insist on having a single initial state in an NFA, one more state is necessary. It can be used in the case of union, reversal, cyclic shift [16], and AFA-to-NFA conversion [14]. In each of these cases, NFAs with a single initial state require one more state than NFAs with multiple initial states. For the sake of completeness, we recall the proof of the lemma here.

Lemma 2 ([15]).

Let and be sets of pairs of strings and let and be two strings such that , , and are fooling sets for a language . Then every NFA with a single initial state for the language has at least states.

Proof.

Consider an NFA for a language , and let and . Since the strings are in , we fix an accepting computation of the NFA on each string . Let be the state on this computation that is reached after reading . Since is a fooling set for , the states , , …, are pairwise distinct. Since is a fooling set, the initial state is distinct from all the states , , …, . Since is a fooling set, the (single) initial state is also distinct from all the states , , …, . Thus the NFA has at least states. ∎

  • Let and . Then and , and the language is accepted by a 6-state NFA with two initial states. Therefore, we cannot expect that we will be able to find a fooling set for of size . However, every NFA with a single initial state for the language requires at least states since Lemma missing2 is satisfied for the language with

If for strings and , then is a prefix of . A language is prefix-closed if implies that every prefix of is in . The following observations are easy to prove.

Proposition 3 ([19]).

A regular language is prefix-closed if and only if it is accepted by some NFA with all states final. ∎

Proposition 4.

Let be a minimal incomplete DFA for a language . Then the language is prefix-closed if and only if all the states of the DFA are final. ∎

3 Complementation

If is a language over an alphabet , then the complement of is the language . If is accepted by a minimal complete DFA , then we can get a minimal DFA for from the DFA by interchanging the final and non-final states. In the case of incomplete DFAs, we first have to add a dead state, that is, a non-final state which goes to itself on each input, and let all the undefined transitions go to the dead state. After that, we can interchange the final and non-final states to get a (complete) DFA for the complement. This gives the following result.

Theorem 5.

Let . Let be a prefix-closed regular language over an alphabet with . Then , and the bound is tight if .

Proof.

For tightness, we can consider the unary prefix-closed language . ∎

If a language is represented by an -state NFA, then we first construct the corresponding subset automaton, and then interchange the final and non-final states to get a DFA for the language of at most states. This upper bound on the nondeterministic state complexity of complement on regular languages is know to be tight in the binary case [13].

For prefix-closed languages, we get the same bound, however, to prove tightness, we use a ternary alphabet. Whether or not the bound can be met by a binary language remains open.

Theorem 6.

Let . Let be a prefix-closed regular language over an alphabet with . Then , and the bound is tight if .

Proof.

The upper bound is the same as in the general case of regular languages [11]. To prove tightness, consider the language accepted by the NFA shown in Figure 1, in which state goes to the empty set on both and , and to on . Each other state goes to on both and , and to on . Our aim is to describe a fooling set of size for .

Figure 1: The NFA of a prefix-closed language with .

First, let us show that each subset of is reachable in the subset automaton of the NFA . The initial state is , and each singleton set is reached from by . The empty set is reached from by . The set of size , where and , is reached from the set of size by the string . This proves reachability by induction. Now, define as the string, by which the initial state of the NFA goes to the set .

Next, for a subset of , define the string as the string of length , where

We claim that the string is rejected by the NFA from each state in and accepted from each state that is not in . Indeed, if is a state in , then and with and . Hence , which means that the state goes to by since both and move each state to state . However, in state the NFA cannot read , and therefore the string is rejected from . On the other hand, if , then , and the string with and is accepted from through the computation .

Now, we are ready to prove that the set of pairs of strings is a fooling set for the language .

(F1) By , the initial state 1 goes to the set . The string is rejected by from each state in . It follows that the NFA rejects the string . Thus the string is in .

(F2) Let . Then without loss of generality, there is a state such that and . By , the initial state goes to , so it also goes to the state . Since , the string is accepted by from . Therefore, the NFA accepts the string , and so this string is not in .

Hence is a fooling set for of size . By Lemma 1, we have . ∎

4 Intersection and Union

In this section, we study the incomplete and nondeterministic state complexity of intersection and union of prefix-closed languages. If regular languages and are accepted by -state and -state NFAs, respectively, then the language is accepted by an NFA of at most states, and this bound is known to be tight in the binary case [11]. Our first result shows that the bound can be met by binary prefix-closed languages. Then, using this result, we get the same bound on the incomplete state complexity of intersection on prefix-closed languages.

Theorem 7.

Let and be prefix-closed languages over an alphabet with and . Then , and the bound is tight if .

Proof.

The upper bound is the same as for regular languages [11]. For tightness, consider prefix-closed binary languages and that are accepted by an -state and an -state incomplete DFAs and , respectively, shown in Figure 2.

Figure 2: The incomplete DFAs and of prefix-closed languages and with .

Consider the set of pairs of strings of size . Let us show that is a fooling set for the language .

(F1) The string has exactly ’s and ’s. It follows that it is in .

(F2) Let . If , then the string contains ’s, and therefore it is not in . The case of is symmetric.

Hence is a fooling set for , and the theorem follows. ∎

Theorem 8.

Let and be prefix-closed languages over an alphabet with and . Then , and the bound is tight if .

Proof.

Let and be incomplete DFAs for and , respectively. Define an incomplete product automaton , where

The DFA accepts the language . This gives the upper bound . For tightness, consider the same languages and as in the proof of the previous theorem. Notice that and are accepted by -state and -state incomplete DFAs, respectively. We have shown that nondeterministic state complexity of their intersection is . It follows that the incomplete state complexity is also at least . ∎

Our next result on the incomplete state complexity of union on prefix-closed languages can be derived from the result on the quotient complexity of union in [6]. For the sake of completeness, we restate it in terms of incomplete complexities, and recall the proof.

Theorem 9.

Let and be prefix-closed languages over an alphabet with and . Then , and the bound is tight if .

Proof.

Let and be incomplete DFAs for the languages and , respectively. To construct a DFA for the language , we first add the dead states and to the DFAs and , and let go all the undefined transitions to the dead states. Now we construct the classic product-automaton from the resulting complete DFAs with the state set . All its states are final, except for the state that is dead, and we do not count it. Hence we get the upper bound on the incomplete state complexity of union.

Figure 3: The product automaton for incomplete DFAs and from Figure 2; and .

For tightness, we again consider the languages described in the proof of Theorem 7. We add the dead states and and construct the product automaton. The product automaton in the case of and is shown in Figure 3.

Each state of the product automaton is reached from the initial state by the string . Let and be two distinct states of the product automaton. If , then the string is rejected from and accepted from . If , then the string is rejected from and accepted from . Thus all the states in the product-automaton are reachable and pairwise distinguishable, and the lower bound follows. ∎

In the nondeterministic case, the upper bound for union on regular language is , and it is tight in the binary case [11]. We get the same bound for union on prefix-closed languages, however, to define witness languages, we use a four-letter alphabet.

Theorem 10.

Let and be prefix-closed languages over an alphabet with and . Then , and the bound is tight if .

Proof.

Figure 4: The NFAs and of prefix-closed languages and with .

The upper bound is the same as for regular languages [11]. To prove tightness, let and be the prefix-closed languages accepted by the NFAs and , respectively, shown in Figure 4. Let

Let us show that is a fooling set for the language .

(F1) We have and . Both these strings are in . The strings and are in as well.

(F2) If , then the string is not in since . Next, if , then is not in . The argumentation for two pairs from is similar. If we concatenate the first part of a pair in with the second part of a pair in , then we get a string that either contains all three symbols , or contains both symbols and . No such string is in .

Thus is a fooling set for the language . Moreover, the sets and are fooling sets for as well. By Lemma 2, we have . ∎

5 Concatenation

In this section, we deal with the concatenation operation on prefix-closed languages. We start with incomplete state complexity. We use a slightly different ternary witness language than in [6], and prove the upper bound using automata constructions.

Theorem 11.

Let . Let and be prefix-closed languages over an alphabet with and . Then , and the bound is tight if .

Proof.

Let and be incomplete DFAs with all states final accepting the languages and , respectively. Construct an NFA for the language from the DFAs and by adding the transition on a symbol from a state in to the initial state of whenever the transition on in state is defined in . The initial states of the NFA are and , and the set of final states is . Each reachable subset of the subset automaton of the NFA contains at most one state of , and several states of . Moreover, if a state of is in a reachable subset , then must contain the state . This gives the upper bound on since the empty set is not counted.

For tightness, consider the prefix-closed languages and accepted by incomplete DFAs and , respectively, shown in Figure 5, in which the transitions are as follows:

on , state goes to itself, and each state goes to ;

on , each state goes to state , state goes to itself, and state with goes to ;

on , each state with goes to , and each state goes to itself;
and all the remaining transitions are undefined.

Construct an NFA for the language as described above. Let us show that the subset automaton of the NFA has reachable and pairwise distinguishable non-empty subsets.

Figure 5: The incomplete DFAs and of languages and with .

(1) First, let us show that each set is reachable, where and . The proof is by induction on the size of subsets. The set is the initial subset. The set with is reached from the set by the string , and the latter set is reachable by induction.

(2) Now, let us show that each set , is reachable, where , and . The set is reached from by , and the latter set is reachable as shown in (1).

(3) Next, we show that each set with and is reachable. The set is reached from by , and the latter set is reachable as shown in case (2).

(4) Finally, we show that each non-empty set with and is reachable. If with , then is reached from the set by , and the latter set is reachable as shown in case (3).

This proves the reachability of non-empty subsets.

To prove distinguishability, notice that the string is accepted by the DFA only from the state 0, and the string is accepted only from the state (). If and are two distinct subsets of , then and differ in a state . If , then distinguishes and , and if , then distinguishes and .

Next, the sets and , where and are distinct subsets of , go to and , respectively, by . Since and are distinguishable, the sets and are distinguishable as well.

Finally, notice that the string is accepted by the NFA from each state , but rejected from each state in . Hence the sets and , where and are subsets of , are distinguishable. Now let . Then and go to and , respectively, by . Since and are distinguishable, the sets and are distinguishable as well. This proves the distinguishability of all the reachable subsets, and completes the proof. ∎

In the next theorem, we consider the nondeterministic case. For regular languages, the upper bound on the nondeterministic state complexity of concatenation is , and it is tight in the binary case [11]. For prefix-closed languages, we get the same bound for concatenation. However, we define witness languages over a ternary alphabet.

Theorem 12.

Let . Let and be prefix-closed languages over an alphabet with and . Then , and the bound is tight if .

Proof.

The upper bound is the same as for regular languages [11]. For tightness, consider the ternary prefix-closed languages and accepted by incomplete DFAs and , respectively, shown in Figure 6. Notice that if a string is in , then is in the language , and the number of ’s in is at most .

Figure 6: The incomplete DFAs of prefix-closed languages and with .

For , define the pair as follows:

Let us show that the set of pairs is a fooling set for the language .

(F1) For each , we have . Thus is in since is in and is in .

(F2) Let and . Then the number of ’s in the string is greater than , and therefore the string is not in . If , then . Thus is not in , and therefore it is not in .

Hence the set is a fooling set for the language , so . ∎

6 Star and Reversal

We conclude our paper with the star and reversal operation on prefix-closed languages. The star of a language is the language , where and .

If a regular language is accepted by a complete -state DFA, then the language is accepted by a DFA of at most states, and the bound is tight in the binary case [21, 26].

For prefix-closed languages, the upper bound on the quotient complexity for star is , and it has been shown to be tight in the ternary case [6]. In the case of incomplete state complexity, we get the bound . For the sake of completeness, we give a simple proof of the upper bound using automata constructions. Moreover, we are able to define a witness language over a binary alphabet.

Theorem 13.

Let . Let be a prefix-closed regular language over an alphabet with . Then , and the bound is tight if .

Proof.

Let be an incomplete DFA for . Construct an NFA for from the DFA by adding the transition on a symbol from a state to the initial state whenever the transition is defined. In the subset automaton of the NFA , each reachable set is either empty, or it contains the initial state . It follows that .

For tightness, consider the binary incomplete DFA with the state set , the initial state and with all states final. The transitions are as follows. By , the transitions in states 1 and 2 are undefined, each odd state with goes to , and each even state with goes to . By , there is a cycle , each odd state with goes to , and each even state with goes to . If is odd, then goes to itself by , otherwise it goes to itself by . The DFA for is shown in Figure 7.

Figure 7: The incomplete DFA of a prefix-closed language with ; .

Notice that each state with has exactly one in-transition on and on . Denote by the state that goes to on , and by the state that goes to on .

Construct an NFA as described above. Let us show that in the subset automaton of the NFA , all subsets of containing state are reachable and pairwise distinguishable.

We prove reachability by induction on the size of subsets. The basis is , and the set is reachable since it is the initial state of the subset automaton. Assume that every set containing with where is reachable. Let where be a set of size . Consider three cases:

  1. Take . Then , and therefore is reachable by the induction hypothesis. Since we have the set is reachable.

  2. Take . Then and contains states and . Therefore, the set is reachable as shown in case . Since we have the set is reachable.

  3. Let , and assume that each set is reachable. Let us show that then also each set is reachable. If is odd, then the set is reached from the set by . If is even, then the set is reached from the set by .

This proves reachability. To prove distinguishability, notice that the string is accepted by the NFA from state since state goes to the initial state by through the computation

if is odd, and through a similar computation if is even. On the other hand, the string cannot be read from any other state with since we have

thus goes to the empty set by , so also by . If is odd, then we have

thus goes to the empty set by , , and so also by . For even, the argument is similar. The string is not accepted from states 1 and 2. Hence the NFA accepts the string only from the state 3. Since there is exactly one in-transition on in state , and it goes from state , the string is accepted by only from state . Similarly, the string is accepted by only from state . Next, for similar reasons, the string is accepted only from , the string is accepted only from , and in the general case, the string is accepted only from (), and the string is accepted only from (). Hence for each state of the NFA , there exists a string that is accepted by only from the state . It follows that all the subsets of the subset automaton of the NFA are pairwise distinguishable since two distinct subsets differ in a state , and the string distinguishes the two subsets. This completes the proof. ∎

We did some computations in the binary case. Having the files of -state minimal binary pairwise non-isomorphic complete DFAs with a dead state and all the remaining states final, we computed the state complexity of the star of languages accepted by DFAs on the lists; here the state complexity of a regular language , , is defined as the smallest number of states in any complete DFA for the language . We computed the frequencies of the resulting complexities, and the average complexity of star. Our results are summarized in Table 2. Notice that for , there is just one language with and . Let us show that this holds for every with .

   1    2    3    4    5    6    7    8    9 average
2 - 2 - - - - - - - 2
3 8 1 6 - - - - - - 1.866
4 161 1 48 30 6 - - - - 1.857
5 4177 1 771 275 350 84 84 - 26 1.849
Table 1: The frequencies of the complexities and the average complexity of star on prefix-closed languages in the binary case;
Proposition 14.

Let . There exists exactly one (up to renaming of alphabet symbols) binary prefix-closed regular language with and .

Proof.

Let be a minimal two-state DFA for the language . Since is prefix-closed, the language is prefix-closed as well. It follows that state 0 is final, and state 1 is dead, thus and .

Without loss of generality, state 1 is reached from the initial state 0 by , thus .

Since , the language contains a non-empty string. This means that the language contains a non-empty string as well. Therefore, we must have , and so .

Figure 8: The only binary -state complete DFA of a prefix-closed language with .

Now let be the minimal -state DFA for . Then all the states of are final, except for the dead state. Since , no may occur in any string of . Hence each non-dead state of must go to the dead state on . Since all states must be reachable, we must have a path labeled by and going through all the final states. The last final state must go to the dead state on because otherwise all final states would be equivalent. The resulting -state DFA is shown in Figure 8. ∎

The reverse of a string is defined by , and for in and in . The reverse of a language is the language . If a regular language is accepted by a complete -state DFA, then the language is accepted by a complete DFA of at most states [23, 26], and the bound is tight in the binary case [17, 20].

For prefix-closed languages, the quotient complexity of reversal is [6], and it follows from the results on ideal languages [5] since reversal commutes with complementation, and the complement of a prefix-closed language is a right ideal; here a language is a right ideal if .

We restate the result for reversal in terms of incomplete state complexity, and prove tightness using a slightly different witness language.

Theorem 15.

Let . Let be a prefix-closed regular language over an alphabet with . Then