Synchronizing non-deterministic finite automata

# Synchronizing non-deterministic finite automata

Henk Don Department of Mathematics, Vrije Universiteit Amsterdam, De Boelelaan 1081a, 1081 HV Amsterdam, The Netherlands, email: h.don@vu.nl Hans Zantema
###### Abstract

In this paper, we show that every D3-directing CNFA can be mapped uniquely to a DFA with the same synchronizing word length. This implies that Černý’s conjecture generalizes to CNFAs and that the general upper bound for the length of a shortest D3-directing word is equal to the Pin-Frankl bound for DFAs. As a second consequence, for several classes of CNFAs sharper bounds are established. Finally, our results allow us to detect all critical CNFAs on at most 6 states. It turns out that only very few critical CNFAs exist.

## 1 Introduction and preliminaries

In this paper we study synchronization of non-deterministic finite automata (NFAs). As is the case for deterministic finite automata (DFAs), symbols define functions on the state set . However, in an NFA symbols are allowed to send a state to a subset of , rather than to a single state. An NFA is called complete if these subsets are non-empty. This basically says that in every state, every symbol has at least one out-going edge. Formally, a complete non-deterministic finite automaton (CNFA) over a finite alphabet consists of a finite set of states and a map . We denote the number of states by or by .

A DFA is called synchronizing if there exists a word that sends every state to the same fixed state. In 1964 Černý [4] conjectured that a synchronizing DFA on states always admits a synchronizing (or directing, reset) word of length at most . He gave a sequence of DFAs in which the shortest synchronizing word attains this bound. In this paper, we denote the maximal length of a shortest synchronizing word in an -state DFA by . The best known bounds for are

 (n−1)2≤d(n)≤n3−n6. (1)

For a proof of the upper bound, we refer to [14]. A DFA on states is critical if its shortest synchronizing word has length ; it is super-critical if its shortest synchronizing word has length . So Černý’s conjecture states that no super-critical DFAs exist. It turns out that there are not too many critical DFAs. Investigation of all critical DFAs with less than 7 states but unrestricted alphabet was recently completed [8, 6]. DFAs without copies of the same symbol and without the identity are called basic. For , only 31 basic critical DFAs exist up to isomorphism. So critical DFAs are very infrequent, as the total number of basic DFAs on states is , including isomorphisms. For , the only known examples are from Černý’s sequence.

For and , let be the set of all states where one can end when starting in some state and reading the symbols in consecutively. Write for . Formal definitions will be given in Section 1.1. A DFA is synchronizing if there exists and such that for all . There are several ways to generalize this concept of synchronization to CNFAs, see [12]. In this paper, we study CNFAs known in the literature as D3-directing. This notion is defined as follows:

###### Definition 1.

A CNFA is called D3-directing if there exists a word and a state such that for all . The word is called a D3-directing word.

An example of a D3-directing CNFA is depicted below. There exist several D3-directing words of length four, but no shorter ones. An example is , which gives , and . The synchronizing state for this word is 1. Another D3-directing word is , for which and . Here the synchronizing state is 2.

A word is D3-directing if starting in any state , there exists a path labelled by that ends in . For DFAs this notion coincides with a synchronizing word. If a CNFA is D3-directing, a natural question is to find the length of a shortest D3-directing word. We denote this length by . Furthermore, we denote by the worst case, i.e. we let CDir(3) be the collection of all D3-directing CNFAs and define

 cd3(n)=max{d3(A):A∈CDir(3),|A|=n}. (2)

In [12] it is shown that for all ,

 (n−1)2≤cd3(n)≤12n(n−1)(n−2)+1. (3)

The lower bound follows from the fact that every DFA is also a CNFA and that for DFAs the notions of synchronization and D3-directability coincide. As far as we are aware, these bounds are still the sharpest known for CNFAs, although sharper results were recently obtained for the essentially equivalent problem of bounding lengths of column-primitive products of matrices [5]. Analogous to DFAs, a D3-directing CNFA is called critical if its shortest D3-directing word has length , and super-critical if it has length . In the current paper, we will prove that in fact , which immediately sharpens the upper bound for to . Our result also implies that Černý’s conjecture is equivalent to the following:

###### Conjecture 1.

Every D3-directing CNFA with states admits a D3-directing word of length at most .

The main ingredient to prove that is a splitting transformation Split that maps a CNFA to a DFA. Every D3-directing CNFA is transformed into a synchronizing DFA , preserving the shortest D3-directing word length. For several classes of DFAs, the Černý conjecture has been established, or sharper bounds than the general bounds have been proven, see for example [1, 2, 7, 9, 10, 13, 16]. If satisfies the properties for one of these classes, then the sharper results for also apply to the CNFA . This observation gives rise to generalize several properties of DFAs into notions for CNFAs and to check if these generalized properties are preserved under Split. In this way, we derive sharper upper bounds on the maximal D3-directing word length for several classes of CNFAs.

Finally, in this paper we search for examples of critical D3-directing CNFAs. Note that the number of CNFAs without identical symbols on states is huge, namely when we include isomorphisms. Therefore an exhaustive search is problematic, even for small . However, since we know that every critical CNFA can be transformed into a critical DFA, we can try to find critical examples by reversing the transformation Split. Since all critical DFAs on states are known, this approach allows us to identify all critical CNFAs on states. Applying this strategy to the other known critical DFAs, the only critical CNFAs we find are small modifications of Černý’s sequence.

### 1.1 Preliminaries

In this section we present our formal definitions and notation which will be slightly different from the traditional notation, as we avoid the use of the transition function. A symbol (or letter, label) in a CNFA will be a function , and we denote by . A symbol extends (denoting the extension by as well) to a function by . The set of all letters on that can be obtained in this way is denoted , which is a strict subset of the set of all functions from to . The set of all possible symbols in a DFA on its turn is a subset of :

 Td(Q)={a∈T(Q):∀q∈Q |qa|=1}.

A CNFA is defined to be a pair , where . Similarly a DFA is a pair with . Note that these definitions do not allow for two symbols that act exactly in the same way: if is a possible symbol, then either or .

A symbol induces a directed graph with vertex set and can therefore be viewed as a subset of :

 a={(q,p):q∈Q,p∈qa},

so is identified with the set of all edges in . This point of view is used to define set relations and operations like inclusion and union on . For example, if , then

 a∪b={(q,p):q∈Q,p∈qa∪qb}.

Suppose is a CNFA and are such that . If is D3-directing, then the automaton is D3-directing as well with the same shortest synchronizing word length. Also the identity symbol has no influence on synchronization. Therefore, a CNFA is called basic if it has no identity symbol and no symbol is contained in another one. For DFAs this coincides with the existing notion of basic.

If and are CNFAs, we say that is contained in and write if for all there exists such that . Alternatively, we say that is an extension of . If and , we say that strictly contains (or is a strict extension of) . A critical CNFA is minimal if it is not the strict extension of another critical CNFA; it is maximal if it does not admit a basic critical strict extension.

Finally, for and , define inductively by and . So a word also is a function on , being the composition of the transformations by each of its letters. Therefore also the transition monoid is contained in .

## 2 Transforming a CNFA into a DFA, preserving D3-directing word length

In this section we present the transformation Split and explore some of its properties. We note that similar but less explicit ideas were recently used in [3, 11] to give bounds on the length of a positive product in a primitive set of matrices. We start by introducing a parametrized version of our transformation:

###### Definition 2.

Let be a CNFA. Fix and denote the set by . Define new symbols on as follows:

 qsplitai:=qi,andqai:=qa% for\ q≠qsplit,

and a new alphabet The CNFA will be denoted .

The idea of this transformation is that we want to make a CNFA ‘more deterministic’. If , then multiple outgoing edges in the state are labelled by the symbol . So we could say that offers a choice in . For each possible choice, we introduce a new symbol that is deterministic in and behaves as in all other states. If , then is not changed by the transformation. This definition immediately implies the following properties:

###### Lemma 1.

Let be a CNFA. Fix and let be . Let be an arbitrary deterministic symbol. Then

1. for all , there exists such that ,

2. (there exists such that ) (there exists such that ).

###### Proof.

Let , we will find with the property claimed in the lemma. If , take . If , then is one of the new symbols . In this case take . Then for and for . This proves the first statement.

Suppose such that , so for all . Then . By Definition 2, there exists such that and for . This means . Now suppose such that . By statement 1 of the lemma, there exists such that , which implies . ∎

The parametrized Split preserves synchronization properties, as is shown in the next lemma.

###### Lemma 2.

Let be a CNFA and let for some . Then

1. is D3-directing if and only if is D3-directing,

2. If and are D3-directing, then .

###### Proof.

Let be the alphabet of and denote the new labels by . First assume that is D3-directing. There exist and such that for all . From each state there exists a path labelled by that ends in :

 q1=q01 \lx@stackrelw1⟶ q11 \lx@stackrelw2⟶…\lx@stackrelw|w|⟶ q|w|1=qs q2=q02 \lx@stackrelw1⟶ q12 \lx@stackrelw2⟶…\lx@stackrelw|w|⟶ q|w|2=qs ⋮ qn=q0n \lx@stackrelw1⟶ q1n \lx@stackrelw2⟶…\lx@stackrelw|w|⟶ q|w|n=qs

We will construct a word which follows the same paths. We may assume that paths do not diverge again once they have met, i.e. if , then .

Let and suppose for some . If , define to be . Then for all . If ,

then for some . This means , so there exists such that . Define to be . Finally, for all , let . Then for all , so is a D3-directing word for .

Now assume that is D3-directing with D3-directing word and synchronizing state . By repeated application of Lemma 1 (replacing every symbol of that is not in by ), it follows that there exists such that that for all . Therefore and the word is D3-directing for .

The above arguments prove the first statement of the lemma. Clearly, rewriting a D3-directing word from to and vice versa preserves the length. This implies the second statement. ∎

Next we investigate the result of applying consecutive parametrized Split transformations to all non-deterministic symbols in a CNFA . We will show that this terminates and that the result is a uniquely defined DFA.

###### Lemma 3.

Let be a CNFA, and repeat the following. If is not a DFA, choose and for which . Let . Then

1. There exists where this process ends such that is a DFA.

2. The resulting DFA does not depend on the choices of and .

###### Proof.

If is a DFA, then for all and . If is not a DFA, then

 ∏q∈Q∏a∈Σk|qa|>∏q∈Q∏a∈Σk+1|qa|≥1.

As this integer sequence is strictly decreasing, it ends in 1, i.e. there exists such that the th term is equal to 1. This is equivalent to the first claim of the lemma.

For the second claim, choose such that is a DFA. Let be an arbitrary deterministic symbol. By repeated application of the second statement of Lemma 1, it follows that if and only if there exists for which . Therefore the DFA does not depend on the splitting choices, proving the second statement. ∎

###### Definition 3.

Let be a CNFA. The unique DFA that is produced by repeated application of the parametrized Split will be called .

Lemma 3 guarantees that is well-defined. Extension of Lemma 1 leads to the following characterization:

###### Lemma 4.

Let be a CNFA. Denote the DFA by . Let be an arbitrary deterministic symbol. Then if and only if there exists such that .

###### Proof.

If , repeatedly apply the first statement of Lemma 1. If there exists such that , then repeated application of the second statement of Lemma 1 proves existence of such that . Since both and are deterministic symbols, , so . ∎

Moreover, we have the following:

###### Corollary 1.

If is a D3-directing CNFA, then .

###### Proof.

This is an immediate consequence of Lemma 2. ∎

Now also the main result of this section is straightforward:

###### Theorem 1.

The maximal shortest D3-directing word length for DFAs is the same as for CNFAs, i.e.

 d(n)=cd3(n).
###### Proof.

Since every DFA is also a CNFA and the notions of synchronization and D3-directedness coincide for DFAs, it follows that . By Corollary 1 every CNFA has a corresponding DFA with the same shortest D3-directing word length. Therefore . ∎

This theorem establishes equivalence of Cerný’s conjecture to Conjecture 1. It also implies the following sharpening of the upper bound for :

## 3 Sharper bounds for several classes of CNFAs

For several classes of DFAs the Černý conjecture has been settled, or at least better upper bounds than the cubic one for the general case have been obtained. If the Split transform reduces a CNFA to a DFA that belongs to one of these classes, then as a direct consequence we obtain improved bounds for the D3-directing length in the CNFA. In this section we present a couple of results of this type.

The general pattern of the arguments in this section is as follows. First we give the definition of a property for DFAs, together with references to the best known upper bound for synchronization lengths in DFAs satisfying . Then we give a natural extension of to the class of CNFAs. Finally, we show that every CNFA satisfying is reduced to a DFA satisfying . Corollary 1 then guarantees that the length of the shortest D3-directing word in is at most .

### 3.1 Cyclic automata

A DFA is cyclic if one of the letters in acts as a cyclic permutation on .

###### Definition 4.

A DFA with is called cyclic if there exists such that for all

 qan=qandqak≠qfor1≤k≤n−1.

Equivalently, the states can be indexed in such a way that for , and . Examples of cyclic automata include the well-known sequence discovered by Černý.

Dubuc [9] proved that a synchronizing cyclic DFA has a synchronizing word of length at most , as predicted by Černý’s conjecture. We define non-deterministic cyclic automata as follows.

###### Definition 5.

A CNFA is called cyclic if there exists and an indexing of the states such that

 qi+1∈qiafor1≤i≤n−1,andq1∈qna.

Note that with this definition, a CNFA is cyclic if and only if it is the extension of a cyclic DFA.

###### Proposition 1.

If is a D3-directing cyclic CNFA, then has a shortest D3-directing word of length at most .

###### Proof.

Denote by . Choose and an indexing of the states as in Definition 5. Define such that for , and . Then so Lemma 4 gives . Therefore is a cyclic DFA and the result follows.∎

### 3.2 One-cluster automata

A DFA is called one-cluster if for some letter , there is only one cycle (possibly a self-loop) labelled . For every , the path eventually ends in this cycle. One way to formally define this is:

###### Definition 6.

A DFA is called one-cluster if there exists and such that for all

 qak=pfor\ somek∈N.

Note that cyclic DFAs are contained in the class of one-cluster automata. Béal, Berlinkov and Perrin [2] proved that in a synchronizing one-cluster DFA, the length of the shortest synchronizing word is at most . We define one-cluster CNFAs in the following way.

###### Definition 7.

A CNFA is called one-cluster if there exists and such that for all

 p∈qakfor\ somek∈N.

With this definition, cyclic CNFAs are a special case of one-cluster CNFAs. Like for the cyclic case, the CNFA is one-cluster if and only if it it an extension of a one-cluster DFA.

###### Proposition 2.

If is a D3-directing one-cluster CNFA, then has a shortest D3-directing word of length at most .

###### Proof.

Let . Choose and as in Definition 7 and denote the states by . For , let be the smallest integer such that (note that this can also be done if ). Choose such that . Define a symbol by for all . Then and . By Lemma 4, and therefore is a cyclic DFA. ∎

###### Remark 1.

To see if a CNFA is one-cluster, it is sufficient to check pairs of states. A CNFA is one-cluster if and only if there exists such that for any there exist for which .

### 3.3 Monotonic automata

###### Definition 8.

A DFA is called monotonic if admits a linear order such that for each the map preserves the order , i.e.

 qa≤q′awheneverq≤q′.

Ananichev and Volkov [1] proved that the length of the shortest synchronizing word in a synchronizing monotonic DFA is at most . The following definition extends the notion of monotonicity to non-deterministic automata:

###### Definition 9.

A CNFA is called monotonic if admits a linear order such that for each

 max{qa}≤min{q′a}wheneverq≤q′.

Every DFA contained in a monotonic CNFA is monotonic. But for a CNFA to be monotonic, it is not sufficient that it contains a monotonic DFA. One can easily extend a monotonic DFA to a CNFA for which is not monotonic. Just add to the DFA a symbol that sends every state to the full state set .

###### Proposition 3.

If is a D3-directing monotonic CNFA, then has a shortest D3-directing word of length at most .

###### Proof.

Let and . Choose . By Lemma 4 there exists such that , i.e. for all . The monotonicity of implies

 qb≤max{qa}≤min{q′a}≤q′bwheneverq≤q′,

demonstrating that is a monotonic DFA. This implies the result. ∎

### 3.4 Orientable automata

A DFA is called orientable if admits a cyclic order

 q1≺q2≺…≺qn≺q1

on , such that for each letter the sequence is (after removal of duplicates) a subsequence of a cyclic shift of . One can think of this as the order of the states on the circle being preserved under . This can equivalently be formalized in terms of a linear order:

###### Definition 10.

A DFA is called orientable if admits a strict linear order such that for each at most one of the following inequalities is violated:

 q1a≤q2a≤…≤qna≤q1a.

Eppstein [10] proved that orientable automata satisfy the conjecture of Černý: if an orientable DFA is synchronizing, then the shortest synchronizing word has length at most . Černý’s own examples are orientable. We extend the notion of orientability as follows to the non-deterministic case:

###### Definition 11.

A CNFA is called orientable if admits a strict linear order on such that for each at most one of the following inequalities is violated:

As is the case for DFAs, the class of orientable CNFAs contains the monotonic CNFAs. Like in the monotonic case, every DFA contained in an orientable CNFA is orientable, but not every extension of an orientable DFA is an orientable CNFA.

###### Proposition 4.

If is a D3-directing orientable CNFA, then has a shortest D3-directing word of length at most .

###### Proof.

Suppose is an orientable CNFA. Let and choose . Then by Lemma 4 there exists such that , i.e. for all . It follows that at most one of the inequalities

 q1b≤q2b≤…≤qnb≤q1b

is violated so that is an orientable DFA. ∎

### 3.5 Automata with underlying Eulerian digraph

Every DFA has an underlying directed graph (digraph) with vertex set and with edges corresponding to actions of elements of . Formally, with and , where we consider to be a multiset, which means that each edge has a multiplicity.

A digraph is called Eulerian if it is strongly connected and all indegrees and outdegrees are the same. Kari [13] proved that a synchronizing DFA for which the underlying graph is Eulerian admits a synchronizing word of length at most .

For a CNFA with underlying Eulerian digraph it is not necessarily the case that the underlying digraph of the DFA is Eulerian as well. However, we can decompose into directed graphs for each of the symbols in : define with and . The following stronger property is preserved under Split:

###### Definition 12.

A CNFA is called strongly Eulerian if for all the corresponding digraph is Eulerian.

###### Proposition 5.

If is a D3-directing strongly Eulerian CNFA, then has a shortest D3-directing word of length at most .

###### Proof.

Let and denote its underlying digraph by . Let’s first assume that is a singleton and that all in- and outdegrees of are equal to . Suppose are such that . Since all have -outdegree , there will be exactly letters for which . Doing this for all and using that we obtain that has outdegree in . An analogous argument gives the same for the indegrees.

Now suppose is not a singleton, i.e. . Assume that all degrees in are equal to . Then by repeatedly applying the above argument, we obtain that all degrees in are equal to . Clearly is also strongly connected. Therefore is Eulerian which implies the result.∎

### 3.6 Aperiodic automata

One way to define aperiodic DFAs is the following:

###### Definition 13.

A DFA is called aperiodic if for all and there exists such that .

An aperiodic DFA with strongly connected underlying digraph is synchronizing and has a synchronizing word of length at most , see [16].

The definition could also be written down for CNFAs, but this property is not preserved under the Split transformation. For example, let with and , where is defined by for all . Then clearly for all and . We will show that admits periodic words. Let and let be such that and . Then (as contains every possible symbol), so by Lemma 4 it follows that . However, if even and if odd. Therefore fails to be aperiodic.

###### Proposition 6.

Suppose is a CNFA with the following property: for all and there exists such that and . If its underlying digraph is strongly connected, then is D3-directing and has a D3-directing word of length at most .

###### Proof.

Let . Let . By Lemma 4 there exist such that for all . Let and choose such that and . Since is not empty and it follows that and similarly . Consequently, is an aperiodic DFA. Furthermore, if has a strongly connected underlying digraph, then so has . ∎

CNFAs with the property of Proposition 6 transform by Split into an aperiodic DFA. However, having this property is not necessary for being transformed into an aperiodic DFA, as the next example shows. Let where is defined by and . Then for all . Nevertheless is aperiodic, as can be easily verified.

## 4 Investigating critical CNFAs

In [8] all critical DFAs on 3 or 4 states were identified, and for states all critical extensions of known critical DFAs. Recently it was confirmed [6] that for and 6 no more critical DFAs exist. From Corollary 1 we know that is a critical DFA for every critical CNFA . So if for any DFA we can investigate which CNFAs map to by Split, we can combine these observations to identify all critical CNFAs with states. For states the investigation restricts to resulting known critical DFAs. In doing so, first we concentrate on investigating which CNFAs map by Split to a given DFA, independent of size or being critical.

For a DFA we define a graph structure on . More precisely, we define to be the undirected graph of which is the set of nodes and the set of edges is defined by

 {a,b}∈E⟺∃q∈Q:qa≠qb∧∀r≠q:ra=rb.

Next we show that if then any gives rise to a CNFA such that . The CNFA is defined by

 Σ′={a∪b∣{a,b}∈E′}∪{a∣∄b:{a,b}∈E′}.

So symbols not connected by an edge in remain unchanged, and any two symbols that are connected by an edge in are joined into the new symbol . It is defined by for the single state with , and for . In particular, every symbol in is non-deterministic in at most one state, and in that state only two choices are possible.

As an example consider the following DFA on three states, and six symbols . In fact it is the DFA from [8], extended by an extra symbol that acts as the identity. On the right its graph is shown: the nodes are , and there are three edges , and . These are exactly the pairs of symbols acting in the same way on two of the three states.

The edge set of the graph has size 3, so there are 8 possible choices for . As an example we show the resulting CNFA for , in which there are 4 symbols :

The next theorem states that under some conditions for a given DFA these CNFAs are exactly all pre-basic CNFAs for which . Here a CNFA is called pre-basic if no symbol is contained in another; the difference with basic is that now the identity symbol is allowed. Note that by definition every CNFA of the shape is pre-basic, and also every DFA is pre-basic.

###### Theorem 2.

Let be a DFA for which does not admit cycles of length 3 or 4. Then a pre-basic CNFA satisfies if and only if for some .

###### Proof.

For the ‘if’-part, let for . If then , so . Otherwise, let and . Since there exists such that and for all . Now it is straightforward from the definitions that . By repeating this process by removing all elements from one by one, after applying a number of Split operations on we obtain , hence .

For the ‘only if’-part we have to show that no other pre-basic CNFA satisfies . So let be an arbitrary CNFA satisfying . We will prove by induction on the length of the shortest -path from to that for some . Let the first step be in which consists of at least two states for ; by the induction hypothesis we assume for some .

First assume that consists of at least three states. Then according to Lemma 4 every satisfying is in . Among these there are three symbols such that the states are all distinct for and for every the states are all equal for . But then form a 3-cycle in , contradicting the assumption of the theorem.

Hence consists of exactly two states , where are symbols in for which are single states and for all other states . Now we claim that consists of exactly one state for every . If not, then choose for which , . For all other states choose . For define , and for all other states . Then for we obtain , so by Lemma 4. But this yields a 4-cycle in , contradicting the assumption of the theorem. Hence indeed consists of exactly one state for every . Hence , and , and . Since , the non-deterministic symbols of both and are exactly and the non-deterministic symbols of . Since and is pre-basic, the deterministic symbols of are exactly the symbols from that are not covered by . As the same holds for , we conclude that all symbols of and coincide. Hence , concluding the proof. ∎

The proof demonstrates that the requirement concerning cycles is only needed for one of the implications in Theorem 2. If does contain a 3- or 4-cycle, it is still possible to detect CNFAs that are mapped to by Split. As before, every set of edges in corresponds to such a CNFA. However, there might exist other CNFAs which are mapped to as well.

The following examples show that for the other implication in Theorem 2 it is essential to disallow cycles of length both 3 and 4 in . Let be defined by , , , , . Then in we have three symbols that form a 3-cycle in , and is not of the shape for some set of edges of .

As a next example let be defined by , , . Then in we have four symbols that form a 4-cycle in , and is not of the shape for some set of edges of